The Impact of Renewable Energy Adoption on CO2 Emission Reductions

Author

Faber Bickerstaffe, Antoine Magnin, and Robin Michel

Published

December 22, 2023

1 Introduction

1.1 Overview and Motivation

Renewable energy sources are increasingly becoming integral to addressing climate change and promoting sustainability. In recent years, significant advancements in renewable energy technologies have positioned them as pivotal solutions to the world’s pressing environmental challenges. This growing reliance on renewable energy underscores the importance of understanding the key challenges society faces in achieving sustainability and combating climate change.

Our motivation for this project stems from the pressing need to tackle global environmental challenges, particularly the reduction of greenhouse gas emissions and the improvement of air quality. We are driven by a strong research interest in environmental science and sustainable energy technologies. Our aim is to provide data-driven insights into how sustainable energy generation contributes to a change in CO2 emission and particulate matter (PM 2.5). By conducting this research, we hope to make valuable contributions to the fields of environmental science and energy policy, fostering a cleaner and more sustainable future.

The benefits of this research extend to:

  • Policy Development: By quantifying the impact of sustainable energy, our findings can inform more effective environmental policies and strategies.
  • Technology Advancement: Identifying the most effective types of sustainable energy sources will encourage innovation and investment in those areas.
  • Public Awareness: Our research contributes to public knowledge, highlighting the importance of sustainable energy in combating environmental issues.

From this project, we expect to extract crucial information, including:

  • Statistical Evidence: Quantitative data on the relationship between sustainable energy generation and CO2 emission levels.
  • Comparative Analysis: A comparison of the impact of different sustainable energy sources on environmental metrics.
  • Trend Analysis: Identifying patterns and trends over time in the adoption of sustainable energy and its effects on CO2 emissions and air quality.

1.2 Project Objectives

The main objectives of our project are the following:

  1. Quantify Impact: Determine the extent to which sustainable energy generation affects CO2 emissions.
  2. Identify Key Factors: Uncover which types of sustainable energy sources have the most significant impact on reducing emissions.
  3. Assess Air Quality Effects: Determine how the effect of sustainable energy on CO2 emissions contributes to the improvement of air quality (reduction of PM 2.5).

1.3 Research Questions

  1. To what extent does sustainable energy generation impact carbon dioxide emissions?
  2. What are the key factors and types of sustainable energy sources that have the most significant impact on reducing CO2 emissions?
  3. How does sustainable energy generation impact the quality of air (reduction of PM 2.5)?
  4. Is there a temporal trend between the growth of sustainable energy generation and the reduction in CO2 emissions and what implications does this trend hold for future sustainability efforts?

2 Data

2.1 Data sources and description

2.1.1 Renewable energy generations

We collected our first database from the “Energy Institute” that provides a full database on energy generations and emissions. We used this database to gather information about each country’s sustainable energy generation from renewable sources such as wind, solar, hydro, nuclear, geothermal, and others. This data base initially came from an .xlsx format with 62 sheets which we had to convert into a data frames.

Source: https://www.energyinst.org/statistical-review/resources-and-data-downloads

From this extensive database, we concentrated specifically on the relevant data variables that aligned with our project objectives (table 1.1).

Table 1.1: Energy Generation Sources Database
Variables Meaning
Country Country
Year Year
Geo Biomass Other - TWh TWh generation from Geothermal and others
Hydro Generation - TWh TWh generation from Hydro
Nuclear Generation - TWh TWh generation from Nuclear
Solar Generation - TWh TWh generation from Solar
Wind Generation - TWh TWh generation from Wind

2.1.2 CO2 Emissions

Our second database, sourced from “Climate Watch”, provides data on CO2 emissions per capita by country. This essential resource enabled us to analyze and compare the carbon footprint across nations, offering insights into the impact of sustainable practices on emissions.

Source: https://www.climatewatchdata.org/data-explorer

Table 1.2: CO2 Emissions Database
Variables Meaning
Country Country
Year Year
CO2 emissions Annual CO₂ emissions (per capita)

2.1.3 PM2.5 exposure

The third key database was obtained from “The World Bank”. This comprehensive database collates data on PM2.5 levels. PM2.5 refers to particulate matter that is 2.5 micrometers or smaller in diameter. These fine particles are a significant concern to the environment’s health because they can penetrate deep into the lungs and enter the bloodstream, potentially causing serious health problems. Monitoring PM2.5 levels is essential for assessing air quality and understanding the health impacts of air pollution. By analyzing this data, we aim to explore the correlation between energy generation methods and air quality, particularly how the adoption of sustainable energy sources might influence the reduction of PM2.5 levels and improve overall air health.

Source: https://databank.worldbank.org/reports.aspx?source=2%20&series=EN.ATM.PM25.MC.M3&country=#

Table 1.3: PM2.5 Emissions Database
Variables Meaning
Country Country
Year Year
PM2.5 Number of particules that is 2.5 micrometers or smaller (in µg/m³)

2.1.4 Population

In order to accurately calculate our renewable energy generation per capita, we have also incorporated a comprehensive global population database, sourced again from “The World Bank”. This additional data enables us to normalize energy generation figures against population sizes, ensuring a more precise and meaningful analysis of renewable energy’s impact on a per capita basis.

Source: https://databank.worldbank.org/reports.aspx?source=2&series=SP.POP.TOTL&country=#

Table 1.4: PM2.5 Emissions Database
Variables Meaning
Country Country
Year Year
Population Number of inhabitants in the country

2.2 Data wrangling

During the data wrangling phase, we processed four distinct datasets, each demanding a tailored approach to cleaning and wrangling. These was necessary to guarantee the datasets’ suitability for our subsequent analysis. The end result of these efforts was the creation of a consolidated and final dataframe, named raw_df.

2.2.1 Merged_renew dataframe

The first dataframe that we focused our efforts on was a comprehensive dataframe dedicated to renewable energy. This dataframe originated from an excel file comprised of 62 sheets. In order to access the pertinent data we had to extract five specific sheets, each corresponding to distinct sources of global renewable energy generation, and arrived in wide format.

Code
#    merged_renew    #
######################

# Specify the file path
path_renew <- here::here("Data/Data_sustainable_energy.xlsx")

# Define the names of the excel sheets
Renewables_sheets <- c("Wind Generation - TWh", "Solar Generation - TWh", "Geo Biomass Other - TWh", "Hydro Generation - TWh", "Nuclear Generation - TWh")

# Read the specific sheets into a list of dataframes
Renewables_list <- lapply(Renewables_sheets, function(sheet_name) {
  read_xlsx(path_renew, sheet = sheet_name)
})

# Name the dataframes
names(Renewables_list) <- Renewables_sheets

# Bring the dataframes to the global environment
list2env(Renewables_list ,.GlobalEnv)

Following the initial data extraction, our next step involved refining the structure of the dataframes. As the data originated from an Excel format that included buffer rows unsuitable for our analytical purposes, we needed to remove these unnecessary entries. Our largest challenge arose from the fact that our dataframes were initially organized in a wide format. We needed a versatile function capable of transforming these dataframes into a long format, whilst also taking into account changing the type of the year variable, and addressing issues such as superfluous columns. Resulting in a standardized long format for comprehensive analysis.

Code
# Function to pivot our df
process_data <- function(df) {

  # Set the coerced years as column names
  # Extract the years from the first row of the data frame and coerce them to integers
  colnames(df) <- as.integer(as.character(df[1, ]))  
  
  # Remove the first row (useless)
  df <- df[-1,]
  
  # Rename the first column to "country"
  colnames(df)[1] <- "country"
  
  # We delete 3 columns (they have no interest and useless columns)
  df <- df[, head(seq_along(df), -3)]
  
  # Pivot of the data
  long_format_df <- df %>%
    tidyr::pivot_longer(
      cols = -country,   # Select all columns except the "country"
      names_to = "year", # Column names go into the "year" column
      values_to = "TWh"  # Values go into the "TWh" column
    )
  
  return(long_format_df)
}

# Use process_data to get our pivot df
wind_df <- process_data(Cleaned_Renewables_list[[1]])
solar_df <- process_data(Cleaned_Renewables_list[[2]])
geo_df <- process_data(Cleaned_Renewables_list[[3]])
hydro_df <- process_data(Cleaned_Renewables_list[[4]])
nuclear_df <- process_data(Cleaned_Renewables_list[[5]])

The final step before merging our 5 renewable energy dataframes was to rename their value column to "sources"_generation, as it was necessary to differentiate between each source. Initially, they were all named “TWh”. Finally we merged each renewable energy dataframes into merged_renew.

Code
#      Merge our renewables energies df's    #
##############################################

# Rename columns in the original dataframes
wind_df <- wind_df %>%
  rename(wind_generation = TWh)
solar_df <- solar_df %>%
  rename(solar_generation = TWh)
geo_df <- geo_df %>%
  rename(geo_generation = TWh)
hydro_df <- hydro_df %>%
  rename(hydro_generation = TWh)
nuclear_df <- nuclear_df %>%
  rename(nuclear_generation = TWh)

# Sequentially merge the dataframes
merged_renew <- wind_df %>%
  merge(solar_df, by = c("country", "year"), all = FALSE) %>%
  merge(geo_df, by = c("country", "year"), all = FALSE) %>%
  merge(hydro_df, by = c("country", "year"), all = FALSE) %>%
  merge(nuclear_df, by = c("country", "year"), all = FALSE)

Once our dataframe was merged, we noticed that some countries were consistently missing values. On closer observation, these countries all had a geopolitical change at some point in time which altered the collection of data. For example, before the dissolution of USSR, values were not present prior to that year for many countries.

Code
# Convert "year" to numeric in merged_renew and drop NA for countries with a geopolitical change
merged_renew <- merged_renew %>%
  mutate(year = as.numeric(year)) %>%
    filter(!(
      (country %in% c("Slovenia", "Croatia", "North Macedonia") & year < 1990) | # Remove year under 1990 for those 3 countries as they were part from yugoslavia so we can't have data before 1990
      (country %in% c("Azerbaijan", "Belarus", "Latvia", "Lithuania", "Kazakhstan", "Russian Federation", "Ukraine", "Estonia", "Turkemenistan", "Uzbekistan") & year < 1984) | # Remove year under 1984 for those countries as they were part from USSR so we can't have data before 1984
      (country == "Bangladesh" & year < 1970) # Remove year under 1970 for Bangladesh as it was part from India so we can't have data before 1970
         )) 

2.2.2 CO2 emissions, population and PM2.5 dataframes

For the co2_df, Pop_df, and PM_df dataframes we performed typical wrangling steps, which consisted of removing missing values, renaming columns to ensure a consistent structure and pivoting. One particular challenge was changing the format of the years using the gsub function, e.g “1965” had the following format “1965[YR1965]”.

Code
#      co2_df      #
####################

# Rename columns of co2_df
co2_df <- co2_df %>%
  rename(`CO2_emissions` = `Annual CO₂ emissions (per capita)`) %>%
  rename(country = Entity) %>%
  rename(year = Year)

# Convert the "year" column to numeric in co2_df
co2_df <- co2_df %>%
  mutate(year = as.numeric(year))

# Check if CO2_df contains missing values : NA or ".."
missing_values <- any(is.na(co2_df) | co2_df == "..")

Note: The same steps were more or less followed for PM_df and Pop_df.

2.2.3 raw_df

Once our 4 dataframes (merged_renew, co2_df, PM_df, Pop_df), were wrangled and cleaned we commenced the merging process. However, once merged, we noticed some important countries were no longer present. After checking our datasets, we found discrepancies in the naming of the countries. In fact, between the 4 dataframes, before merging, the country names weren’t standardized. Therefore, we standardized the country name to correspond to merged_renew naming.

Code
#   raw_df    #
###############

#      Country name replacements for Pop, CO2, PM dfs   
# --> to standardize the country names in our final df as there is some change between the data sets

# List of country name replacements
country_replacements <- c(
  "United States" = "US", "Czechia" = "Czech Republic", "Egypt, Arab Rep." = "Egypt",
  "Iran, Islamic Rep." = "Iran", "Russia" = "Russian Federation",
  "Slovak Republic" = "Slovakia", "Korea, Rep." = "South Korea",
  "Trinidad and Tobago" = "Trinidad & Tobago", "Turkiye" = "Turkey",
  "Venezuela, RB" = "Venezuela")

# Define a function to apply country renaming to a dataset
rename_countries <- function(df) {
  df %>%
    mutate(country = ifelse(country %in% names(country_replacements),
                            country_replacements[country],
                            country))
}

# Apply the renaming to the Pop_df, CO2_df, and PM_df
Pop_df <- rename_countries(Pop_df)
co2_df <- rename_countries(co2_df)
PM_df <- rename_countries(PM_df)

Then we merged our 4 dataframes into raw_df with a left join on merged_renew.

Code
#     Merge PM_df, CO2_df, Pop_df with merged_renew     #
#########################################################

raw_df <- merged_renew %>%
  left_join(PM_df, by = c("country", "year")) %>%
  left_join(co2_df, by = c("country", "year")) %>%
  left_join(Pop_df, by = c("country", "year"))
# raw_df contains all the renewable energy generation, co2 emissions per capita and PM exposure of all countries and all years from 1965 to 2022

# Arrange column order to have dependent variables first
raw_df <- raw_df %>%
  select(1:2, CO2_emissions, PM_exposure, 3:ncol(.)) 

From raw_df, we noticed that some observations weren’t countries but regions, or groupings of countries such as continents, organisations (OCDE), and others. We then checked all the non country observations to remove them. Finally, we end up with the following dataframe :

country year CO2_emissions PM_exposure wind_generation solar_generation geo_generation hydro_generation nuclear_generation Population
New Zealand 1972 5.543 NA 0.000 0.000 1.257 15.265 0.0 2.90e+06
Canada 2007 18.061 NA 3.007 0.026 8.991 367.621 92.8 NA
Portugal 2009 5.401 NA 7.577 0.160 2.271 8.285 0.0 NA
United Kingdom 1979 11.455 NA 0.000 0.000 0.000 4.303 38.3 5.62e+07
Iceland 1979 8.780 NA 0.000 0.000 0.046 2.819 0.0 2.26e+05
Philippines 1983 0.675 NA 0.000 0.000 4.935 3.968 0.0 NA
Portugal 1968 1.473 NA 0.000 0.000 0.228 5.188 0.0 8.84e+06
Iraq 1993 3.272 NA 0.000 0.000 0.000 6.162 0.0 1.93e+07
United Kingdom 2009 7.938 NA 9.281 0.020 10.715 5.228 69.1 NA
Chile 2020 4.344 NA 5.602 7.615 7.152 21.721 0.0 1.93e+07
France 1974 9.982 NA 0.000 0.000 1.577 56.230 14.7 5.34e+07
Lithuania 2001 3.547 NA 0.000 0.000 0.002 0.326 11.4 3.47e+06
Philippines 2015 1.091 19.5 0.748 0.139 11.411 8.665 0.0 1.03e+08
Algeria 1968 0.686 NA 0.000 0.000 0.000 0.563 0.0 1.32e+07
Peru 2001 0.999 NA 0.001 0.000 0.163 17.615 0.0 2.70e+07

Note: Sample of raw_df. For PM_exposure, we only have observations for 1990, 1995, 2000, 2005, 2010-2019, hence the NAs for that column.

2.2.4 Filtering raw_df for deeper analysis

With raw_df, we have all of the essentials data needed for deeper analysis. raw_df is composed of 77 countries with values ranging from 1965 to 2021 for 4215 observations with 10 variables. We used this final dataframe as a basis for further analysis.

From raw_df, we created a per capita dataframe, all_df by dividing our renewable energy variables by population. all_df is our second main dataframes as the primary difference with raw_df is that we have per capita values for all of our renewable energy variables. We used this dataframe when comparing CO2 emissions and renewable energy to standardize the scale.

Code
#   all_df    #
###############

#  Drop any row without a Population  

all_df <- raw_df %>% 
  filter(complete.cases(Population)) # --> 77 countries
# We voluntarily created a "raw_df" to have the total of energy generation, co2 emissions and PM exposure for all the years (independent from the population)
# and then "all_df" where we remove missing values of population to compute renewable energy generation per capita


#    Computing renewable energy generation variables per capita     #
#####################################################################


# Iterate through columns and divide by population
for (colname in colnames(all_df)) {
  # Skip the columns 'country', 'year', 'CO2_emissions', and 'Population'
  if (!colname %in% c('country', 'year', 'CO2_emissions', 'Population')) {
    
    # Identify rows where the column value is not NA
    valid_rows <- !is.na(all_df[[colname]])
    
    # Check if the entire column (excluding NA values) is numeric
    if (all(sapply(all_df[valid_rows, colname], is.numeric))) {
      
      # Only perform operation on those valid rows
      all_df[valid_rows, colname] <- all_df[valid_rows, colname] / all_df[valid_rows, 'Population']
    }
  }
}

After creating all_dfwe created multiple sub dataframes filtering for years and separating developed and non developed countries :

  • All Countries
  • All Countries (Since 1990)
  • Developed Countries Only
  • Developed Countries (Since 1990)
  • Non-Developed Countries Only
  • Non-Developed Countries (Since 1990)

Additionally, we multiplied each of the 6 datasets by 100. This scaling is particularly useful for log transformations in our analysis, enhancing the interpretability of results, especially when dealing with small per capita values.

Code
#     df from 1990 to 2021       #
##################################

raw_1990 <- raw_df %>%
  filter(year >= 1990, year <= 2021)

all_1990 <- all_df %>%
  filter(year >= 1990, year <= 2021)
# We choose 1990 as before that date there weren't many renewable energy generations which doesn't help us to analyse a significant effect

#    df with only developped countries    #
###########################################


# Create a list of developed countries
developed_countries <- c("Australia", "Austria", "Belgium", "Canada", "Cyprus", 
                         "Czech Republic", "Denmark", "Estonia", "Finland", "France", 
                         "Germany", "Greece", "Hungary", "Iceland", "Ireland", "Israel", 
                         "Italy", "Japan", "Latvia", "Lithuania", "Luxembourg", 
                         "Netherlands", "New Zealand", "Norway", "Poland", "Portugal", 
                         "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland", 
                         "United Kingdom", "US", "South Korea", "Singapore", "Qatar", 
                         "United Arab Emirates", "Mexico", "Saudi Arabia", "China")

developed_df <- all_df %>% filter(country %in% developed_countries) # --> 40 countries with per capita values
rawdev_df <- raw_df %>% filter(country %in% developed_countries) # --> 40 countries with total values

#    df with only developped countries from 1990    #
#####################################################

developed_1990 <- developed_df %>%
  filter(year >= 1990, year <= 2021)

rawdev_1990 <- rawdev_df %>%
  filter(year >= 1990, year <= 2021)

#    df with non developped countries    #
###########################################

nondev_df <- all_df[!all_df$country %in% developed_countries, ] # --> 37 countries with per capita values
rawnondev_df <- raw_df[!all_df$country %in% developed_countries, ] # --> 37 countries with total values


#    df with non developped countries from 1990   #
###################################################

nondev_1990 <- nondev_df %>%
  filter(year >= 1990, year <= 2021)
  
rawnondev_1990 <- rawnondev_df %>%
  filter(year >= 1990, year <= 2021)

#        multiply df by 100          #
######################################
# This will allow to make the log scaling more readable for our analysis as we have very low values per capita

# all100_df : mutliply all_df by 100
all100_df <- all_df %>%
  mutate(across(
    .cols = 3:9,
    .fns = ~ . * 100
  ))

# raw100_df : mutliply raw_df by 100
raw100_df <- raw_df %>%
  mutate(across(
    .cols = 3:9,
    .fns = ~ . * 100
  ))

# raw100_1990 : keep only year from 1990 of raw100_df
raw100_1990 <- raw100_df %>%
  filter(year >= 1990, year <= 2021)

# rawdev100_1990 : keep only year from 1990 and dev country

rawdev100_1990 <- rawdev_df %>%
  mutate(across(
    .cols = 3:9,
    .fns = ~ . * 100
  ))  %>%
  filter(year >= 1990, year <= 2021)

# rawdev100_1990 : keep only year from 1990 and nondev country

rawnondev100_1990 <- rawnondev_df %>%
  mutate(across(
    .cols = 3:9,
    .fns = ~ . * 100
  ))  %>%
  filter(year >= 1990, year <= 2021)

# Remove the intermediate df / lists
rm(Pop_df, PM_df, co2_df, merged_renew, Cleaned_Renewables_list, geo_df, `Geo Biomass Other - TWh`, `Hydro Generation - TWh`, hydro_df, `Nuclear Generation - TWh`, nuclear_df, Renewables_list, `Solar Generation - TWh`, solar_df, `Wind Generation - TWh`, wind_df)

3 Exploratory Data Analysis

Let’s proceed with our exploratory data analysis to ensure a thorough analysis that will contribute to answering our research questions. As mentioned in our data description, we’ve consolidated our principal variables into a single dataframe named raw_df. This particular dataframe retains this name because it presents data in its original form, without scaling for population size, which is useful for analyzing overall data patterns. Additionally, we have created all_df, a dataframe that standardizes data on a per capita basis for all of our 77 countries. These dataframes form the basis of our exploratory data analysis.

Code

# Creating the table
data_table <- tibble(
  Variables = c("Country","Year", "CO2 emissions", "PM2.5", "Wind Generation - TWh", "Solar Generation - TWh", "Geo Biomass Other - TWh", "Hydro Generation - TWh", "Nuclear Generation - TWh",   "Population"),
  Meaning = c("Country","Year","Country","Year", "TWh generation from Wind", "TWh generation from Solar", "TWh generation from Geothermal and others", "TWh generation from Hydro", "TWh generation from Nuclear", "Number of inhabitants in the country")
)

# Rendering the table with knitr::kable
knitr::kable(data_table, format = "html", caption = "Table 3.1: raw_df")
Table 3.1: raw_df
Variables Meaning
Country Country
Year Year
CO2 emissions Country
PM2.5 Year
Wind Generation - TWh TWh generation from Wind
Solar Generation - TWh TWh generation from Solar
Geo Biomass Other - TWh TWh generation from Geothermal and others
Hydro Generation - TWh TWh generation from Hydro
Nuclear Generation - TWh TWh generation from Nuclear
Population Number of inhabitants in the country
Table 3.1: raw_df
Variables Meaning
Country Country
Year Year
CO2 emissions Country
PM2.5 Year
Wind Generation - TWh TWh generation from Wind
Solar Generation - TWh TWh generation from Solar
Geo Biomass Other - TWh TWh generation from Geothermal and others
Hydro Generation - TWh TWh generation from Hydro
Nuclear Generation - TWh TWh generation from Nuclear
Population Number of inhabitants in the country

We will begin the EDA by identifying any noticeable trends and data structure by contrasting data from all countries, as well as separating countries by developed and not developed countries. This will help us better understand the different effects that renewable energy generation may have depending of the development of countries. Subsequently, we will examine the specifics of renewable energy generation and CO2 emissions, looking at both the global perspective and individual country scale. Lastly, we aim to explore the relationship between renewable energy generation, CO2 emissions, and PM2.5 levels.

3.1 Data distribution

3.1.1 Histogram plot

We begin our analysis by examining the spread of our numerical data. To visualize this, we have created a histogram that displays the distribution across all countries, as well as separately for developed and developing nations. This initial comparison will offer us a glimpse into how renewable energy generation and CO2 emissions vary among countries with diverse economic and demographic profiles.

Code
### Facet grid for histogram

# Reshape our dataframes into a long format to create an histogram
longall_df <- melt(all100_1990[, -c(2,10)]) # for all country
longdev_df <- melt(alldev100_1990[, -c(2,10)]) # for developed country
longnondev_df <- melt(allnondev100_1990[, -c(2,10)]) # for non developed country

# Add a new column to each to indicate the development status
longall_df$Status <- 'All'
longdev_df$Status <- 'Developed'
longnondev_df$Status <- 'Non-Developed'

# Combine the three dataframes into one
hcombined_df <- rbind(longall_df,longdev_df, longnondev_df)

# Create histograms with facet_grid to ensure same scales
h <- ggplot(hcombined_df, aes(x = value)) +
  geom_histogram(bins = 30, fill = "blue", color = "black") +
  facet_grid(Status ~ variable, scales = "free_x", 
             labeller = labeller(variable = c(
               "CO2_emissions" = "CO2 Emissions",
               "PM_exposure" = "PM2.5 Exposure",
               "wind_generation" = "Wind Generation",
               "solar_generation" = "Solar Generation",
               "geo_generation" = "Geo Biomass",
               "hydro_generation" = "Hydro Generation",
               "nuclear_generation" = "Nuclear Generation"))) + 
  scale_x_log10() + # Apply log scale for x
  theme_minimal() +
  labs(x = "TWh generations for energy sources / CO2 emissions in tons", y = "Count") +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 9),
    strip.text.x = element_text(size = 5), 
    strip.text.y = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 12)
  ) +
  ggtitle("Distribution of variables by development status")

Note : For this histogram plot, we’ve used our per capita dataframe, all_df, which has been multiplied by 100 to make low per capita values more visible, all100_df. This adjustment also allows us to apply log scaling for better visualization. The analysis focuses on data from 1990 to 2021 (all100_1990). This timeframe was chosen because it corresponds to a period when renewable energy generation started to gain more prominence globally. These choices aid in highlighting trends between developed and non-developed countries.

We observe variations in the data between developed and non-developed countries. Developed countries generally exhibit higher values for both CO2 emissions and energy generation. In contrast, the values for PM2.5 appear to be relatively similar for developed and non-developed countries.

3.1.2 Violion plot

To visually emphasize the transition towards higher values in developed countries, especially in terms of CO2 emissions and energy generation, we have created the following violin plot. This plot is designed to provide a clearer visual representation of this shift.

Code
#### Facet grid for violon plot

# Melt and ad development status
longdev_df <- melt(alldev100_1990[, -c(2, 4, 10)], variable.name = 'EnergySource', value.name = 'value') %>%
  mutate(DevelopmentStatus = 'Developed')

longnondev_df <- melt(allnondev100_1990[, -c(2, 4, 10)], variable.name = 'EnergySource', value.name = 'value') %>%
  mutate(DevelopmentStatus = 'Non-Developed')

# Set column names to be displayed in the violon plot 
col_names <- c("CO2_emissions" = "CO2 Emissions",
               "PM_exposure" = "PM2.5 Exposure",
               "wind_generation" = "Wind Generation",
               "solar_generation" = "Solar Generation",
               "geo_generation" = "Geo Generation",
               "hydro_generation" = "Hydro Generation",
               "nuclear_generation" = "Nuclear Generation")

# Rename columns in the melted data frames
longdev_df$EnergySource <- factor(longdev_df$EnergySource, levels = names(col_names), labels = col_names)
longnondev_df$EnergySource <- factor(longnondev_df$EnergySource, levels = names(col_names), labels = col_names)

# Combine the datasets
vcombined_df <- bind_rows(longdev_df, longnondev_df)

# Filter to remove extreme values applying IQR
vcombined_df <- vcombined_df %>%
  group_by(EnergySource) %>%
  mutate(
    lower_bound = quantile(value, 0.25) - 1.5 * IQR(value),
    upper_bound = quantile(value, 0.75) + 1.5 * IQR(value)
  ) %>%
  ungroup() %>%
  filter(value > lower_bound & value < upper_bound) %>%
  select(-lower_bound, -upper_bound)

# Create a violin plot with a facet grid
v <- vcombined_df %>%
  mutate(EnergySource = fct_reorder(EnergySource, value)) %>%

  ggplot(aes(x=EnergySource, y=value, fill=DevelopmentStatus)) +
  geom_violin(trim=FALSE, position=position_dodge(width=0.8), size=0.2) +
  scale_fill_viridis(discrete=TRUE, name="Development Status") +
  theme_ipsum() +
  facet_wrap(~ EnergySource, scales = "free") + 
  theme(
    axis.title.x = element_text(hjust = 0.5, size = 15), 
    axis.title.y = element_text(hjust = 0.5, size = 15),
    strip.text.y = element_text(size = 12),
    strip.text.x = element_blank(),
    legend.text = element_text(size = 14),
    legend.title = element_text(size = 16),
    plot.title = element_text(face = "plain", hjust = 0.5, size = 22),
    legend.position = "bottom"
  ) +
  labs(x = "Count", y = "TWh generations / CO2 emissions in tons") +
  scale_y_continuous(limits = c(0, NA)) + # Set the y-axis to not go below 0
  ggtitle("Distribution of variables by development status")

Note : Just as for our histogram, we used the dataframe all100_1990 as a basis to have per capita values multiply by 100 from 1990. From that database, we separated developed and undeveloped countries into two distinct groups to compare them.

The violin plots in the graph illustrate the distribution of energy generation sources and CO2 emissions between developed and non-developed countries. Just as the previous histogram, we can see that developed countries exhibit higher values for solar, nuclear, wind, and geothermal energy generation, indicating more energy generation from developed than developing countries. Additionally, non developed countries tend to have more values around 0, especially for solar, geo and wind energy. However, hydro generation distributions are similar for both developed and non-developed countries, reflecting its accessibility as an energy source regardless of development status. We can also notice that CO2 emissions per capita are higher in developed countries. However, while developed countries also exhibit greater renewable energy generation per capita, it cannot be conclusively determined from this visualization alone whether an increase in renewable energy generation leads to a change in CO2 emissions per capita.

3.1.3 Correlation matrix

The following correlation matrix for all countries provides a clearer visualization of the relationships among renewable energy generation, CO2 emissions, and PM2.5 exposure levels:

Code
### Correlation matrix

# Remove non-numeric columns and rows with NA values
alldf_numeric <- developed_df[, sapply(all_df, is.numeric)] # selects only columns that are numeric
alldf_numeric <- na.omit(alldf_numeric[, -c(1, 9)]) # removes row with NAs and columns 1, 9

# Calculate the correlation matrix for all country
allcor_matrix <- cor(alldf_numeric)

# Plot the correlation matrix
corrplot(allcor_matrix, method = "color", 
         type = "lower", 
         order = "original", 
         addCoef.col = "black", 
         tl.col = "black", 
         tl.cex = 0.6,
         tl.pos = "lt", # Place text labels on the left and top
         tl.srt = 45, # Rotation of text labels
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
         cl.pos = "r",
         number.cex = 0.6, # Make the correlation numbers smaller
         title = "Correlation matrix for all countries",
         mar = c(0, 0, 2, 0))

The correlation matrix displays the relationship between energy sources, CO2 emissions, and PM exposure. CO2 emissions share a positive correlation with PM exposure, signaling that areas with higher emissions tend to face increased particulate matter exposure. In contrast, energy sources such as wind, solar and nuclear generation have a slighter negative correlation with CO2 emissions, suggesting that their impact on reducing emissions might be present but limited. As for hydro and geothermal generation, they demonstrate almost no correlation with CO2 emissions, suggesting their impact on emissions is minimal or masked by other factors in this analysis.

Overall, while there is an apparent association between CO2 emissions and PM exposure, the link between renewable energy generation and CO2 emissions is less pronounced in this analysis. This could indicate that factors such as the overall energy policy, efficiency measures, and the balance between industrial output and energy consumption may also play significant roles in determining CO2 emissions beyond the scope of renewable energy generation alone.

3.2 EDA specifc to renewable energy observations

3.2.1 The Growth of Renewables

Our dataset commences in the 1960s, capturing a period marked by substantial transformations and growth in the adoption of renewable energy sources. The trends, shown graphically below, underscore a global shift toward sustainable and diversified energy sources, propelled by environmental consciousness, policy incentives, and technological advancements.

Code
### Area plot

# Calculate annual total energy production by source for each year
raw_df_annual <- raw_df %>%
  filter(year <= 2021) %>%
  pivot_longer(
    cols = c(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation),
    names_to = "source",
    values_to = "energy"
  ) %>%
  group_by(year, source) %>%
  summarize(
    total_energy = sum(energy, na.rm = TRUE)
  ) %>%
  ungroup()

# Create an area plot
a <- ggplot(raw_df_annual, aes(x = year, y = total_energy, fill = source)) +
  geom_area() +
  labs(
    x = "Year",
    y = "Total Energy Production (TWh)",
    fill = "Energy Source") +
  scale_fill_manual(values = c("wind_generation" = "#a6cee3", "solar_generation" = "#FDBA74", "geo_generation" = "#b2df8a", "hydro_generation" = "#1f78b4", "nuclear_generation" = "#B39EB5"),
                    labels = c("wind_generation" = "Wind", "solar_generation" = "Solar", "geo_generation" = "Geothermal", "hydro_generation" = "Hydro", "nuclear_generation" = "Nuclear")) +
  theme_minimal() + 
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.text.y = element_text(size = 7),
    axis.text.x = element_text(size = 7),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    ) +
  ggtitle("Annual growth of total energy production by source")

  • The utilization of dams and reservoirs for hydroelectric energy generation as the primary source of renewable energy generation is clear from the early years of our data.
  • Although nuclear power began gaining traction in the 1970s, it witnessed a substantial surge in development in the mid-1980s, as numerous countries constructed reactors for high-density electricity production. Despite its potential, concerns about safety and waste management have tempered its growth.
  • Solar energy, driven by remarkable progress in photovoltaic technology, has exhibited exponential growth since approximately 2010, finding widespread application in residential and commercial settings.
  • Likewise, wind power, propelled by advancements in turbine technology, has undergone substantial expansion since the early 2000s, particularly through the establishment of onshore and offshore wind farms.
  • Geothermal energy, constrained by region-specific limitations, has experienced gradual adoption over the past six decades.

3.2.2 Countries Driving Transition

The traditional global leaders in renewable energy production such as the United States, Canada, and France, have played a pivotal role in advancing cleaner and sustainable energy sources on a global scale. These nations have made significant strides, particularly in harnessing hydroelectric and nuclear energy resources.

Code
### TOTAL Renewable Energy Bar Chart 

# Calculate cumulative total energy production by country
raw_df_cumulative <- raw_df %>%
  group_by(country) %>%
  summarize(
    wind_total = sum(wind_generation, na.rm = TRUE),
    solar_total = sum(solar_generation, na.rm = TRUE),
    geo_total = sum(geo_generation, na.rm = TRUE),
    hydro_total = sum(hydro_generation, na.rm = TRUE),
    nuclear_total = sum(nuclear_generation, na.rm = TRUE),
    total_energy = sum(wind_total, solar_total, geo_total, hydro_total, nuclear_total)
  ) %>%
  ungroup() %>%
  arrange(total_energy) %>%
  top_n(10) 

# Reorder the levels of the country factor
raw_df_cumulative$country <- reorder(raw_df_cumulative$country, raw_df_cumulative$total_energy)

# Reshape the data to long format for stacking
raw_df_cumulative_long <- raw_df_cumulative %>%
  pivot_longer(
    cols = c(wind_total, solar_total, geo_total, hydro_total, nuclear_total),
    names_to = "source",
    values_to = "cumulative_total"
  )

# Create the stacked bar chart
b1 <- ggplot(raw_df_cumulative_long, aes(x = country, y = cumulative_total, fill = source)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Countries with historically the largest total renewable energy production",
    x = "Country",
    y = "Cumulative Energy Production (TWh)",
    fill = "Energy Source"
  ) +
  scale_fill_manual(values = c("wind_total" = "#a6cee3", "solar_total" = "#FDBA74", "geo_total" = "#b2df8a", "hydro_total" = "#1f78b4", "nuclear_total" = "#B39EB5"),
                    labels = c("wind_total" = "Wind", "solar_total" = "Solar", "geo_total" = "Geothermal", "hydro_total" = "Hydro", "nuclear_total" = "Nuclear")) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    plot.title = element_text(hjust = 0.5, size = 11)
    )

Note : This plot presents the top 10 countries with the highest historical cumulative total renewable energy generation.

More recently, China has emerged as a key player in the renewable energy landscape, embarking on an ambitious journey to assert its dominance in this sector. Through substantial investments in solar and hydro power projects, China seeks to enhance its energy security and diversify its overall energy portfolio. This evolving landscape suggests a commitment among these major players to transition towards more sustainable and environmentally friendly energy solutions.

Code
### Line plot for renewable energy per country

# Calculate the annual total energy production for each country
raw_df_total_energy <- raw_df %>%
  group_by(country, year) %>%
  summarize(total_energy = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total energy production
top_10_countries <- raw_df_total_energy %>%
  group_by(country) %>%
  summarize(total_energy = sum(total_energy)) %>%
  ungroup() %>%
  arrange(desc(total_energy)) %>%
  slice_head(n = 10) %>% #### -10 for the 10 lowest
  pull(country)

filtered_data <- raw_df_total_energy %>%
  filter(country %in% top_10_countries)

# Create an interactive line plot with hover labels
plot1 <- ggplot(filtered_data, aes(x = year, y = total_energy, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of total renewable energy generation",
    x = "Year",
    y = "Energy Production (TWh)",
    color = "Country"
  ) +
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_plot1 <- ggplotly(plot1, width = 700, height = 500, tooltip = "text")

Note : This plot presents the total renewable energy generation change of the top 10 countries with the highest generation.

3.2.3 Per Capita Leaders

The countries with the largest per capita renewable energy production are predominantly wealthier Nordic nations that boast expansive and diverse natural landscapes conducive to renewable energy generation. Hydroelectric power from cascading rivers, and favorable geographic conditions for geothermal energy. Their commitment to sustainable development, coupled with robust economic structures, has positioned them at the forefront of the global transition towards renewable energy.

Code
### Per Capita Renewable Energy Bar Chart 

# Calculate average total energy production by country
all_df_average <- all_df %>%
  group_by(country) %>%
  summarize(
    wind_avg = sum(wind_generation, na.rm = TRUE) / n(),
    solar_avg = sum(solar_generation, na.rm = TRUE) / n(),
    geo_avg = sum(geo_generation, na.rm = TRUE) / n(),
    hydro_avg = sum(hydro_generation, na.rm = TRUE) / n(),
    nuclear_avg = sum(nuclear_generation, na.rm = TRUE) / n(),
    total_energy_avg = sum(wind_avg, solar_avg, geo_avg, hydro_avg, nuclear_avg)
  ) %>%
  ungroup() %>%
  arrange(total_energy_avg) %>%
  top_n(10) ### 10 for 10 best or -10 for ten lowest

# Reorder the levels of the country factor
all_df_average$country <- reorder(all_df_average$country, all_df_average$total_energy_avg)

# Reshape the data to long format for stacking
all_df_average_long <- all_df_average %>%
  pivot_longer(
    cols = c(wind_avg, solar_avg, geo_avg, hydro_avg, nuclear_avg),
    names_to = "source",
    values_to = "average_total"
  )

# Create the stacked bar chart
b2 <- ggplot(all_df_average_long, aes(x = country, y = average_total, fill = source)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Countries with historically the largest renewable energy production per capita",
    x = "Country",
    y = "Average Energy Production (TWh)",
    fill = "Energy Source"
  ) +
  scale_fill_manual(values = c("wind_avg" = "#a6cee3", "solar_avg" = "#FDBA74", "geo_avg" = "#b2df8a", "hydro_avg" = "#1f78b4", "nuclear_avg" = "#B39EB5"),
                    labels = c("wind_avg" = "Wind", "solar_avg" = "Solar", "geo_avg" = "Geothermal", "hydro_avg" = "Hydro", "nuclear_avg" = "Nuclear")) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 10),
    axis.title.y = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    plot.title = element_text(hjust = 0.5, size = 11))

Note : This plot presents the top 10 countries with the highest historical cumulative renewable energy generation per capita.

In particular, Iceland has leveraged its abundant geothermal resources, resulting from its location on the Mid-Atlantic Ridge, by harnessing geothermal and hydro energy for both electricity generation and heating, making it a global leader in per capita renewable energy production.

Code
### Line plot for renewable energy per country per capita

# Calculate the annual total energy production for each country
all_df_total_energy <- all_df %>%
  group_by(country, year) %>%
  summarize(total_energy = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual per capita energy production
top_10_countries <- all_df_total_energy %>%
  group_by(country) %>%
  summarize(total_energy = sum(total_energy)) %>%
  ungroup() %>%
  arrange(desc(total_energy)) %>%
  slice_head(n = 10) %>% #### slice_head for the 20 highest and slice_tail for the 20 lowest
  pull(country)

filtered_data <- all_df_total_energy %>%
  filter(country %in% top_10_countries)

# Create an interactive line plot with hover labels
plot2 <- ggplot(filtered_data, aes(x = year, y = total_energy, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of renewable energy generation per capita",
    x = "Year",
    y = "Per capita Energy Production (TWh)",
    color = "Country"
  ) +
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

# Convert the ggplot to a plotly object for interactivity
plotly_plot2 <- ggplotly(plot2, width = 700, height = 500, tooltip = "text")

Note : This plot presents the renewable energy generation per capita change of the top 10 countries with the highest generation.

3.3 EDA Specifc to CO2 emissions

3.3.1 Cumulative CO2 emissions ranking

For the CO2 emissions analysis, we’ll concentrate on the top 10 countries with the highest cumulative CO2 emissions from both developed and developing countries, since they shape the overall emissions trend and are accountable for the major part of CO2 emissions. This focus will provide us with a more defined perspective on the prevailing trends and enable an analysis of disparities between developed and developing nations.

The following bar plot present the developed countries with the highest CO2 emissions per capita, highlighting Qatar’s significant lead, followed by countries like the United Arab Emirates and Luxembourg, with the remaining nations showing comparatively lower emissions. Assessing emissions per capita is useful because it enables meaningful comparisons to be made between countries regardless of their population size.

Code
#### BAR CHART PER CAPITA FOR DEVELOPED COUNTRY

# Calculate the cumulative CO2 emissions per capita for each country
dev_df_cumulative_co2 <- developed_df %>%
  group_by(country) %>%
  summarize(
    co2_cumulative = sum(CO2_emissions, na.rm = TRUE)  
  ) %>%
  ungroup() %>%
  arrange(desc(co2_cumulative)) %>%
  top_n(10, co2_cumulative)  # Top 10 countries with the highest total CO2 emissions

# Create the bar chart for cumulative CO2 emissions per capita
p1 <- ggplot(dev_df_cumulative_co2, aes(x = reorder(country, co2_cumulative), y = co2_cumulative, fill = co2_cumulative)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Cumulative CO2\nEmissions per capita") +
  labs(
    title = "Developed countries with the highest cumulative CO2 emissions per capita",
    x = "Country",
    y = "Cumulative CO2 Emissions per capita"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10))

Note : This plot presents the top 10 developed countries with the highest historical cumulative CO2 emissions per capita.

As for the CO2 emissions per capita of developing countries, the bar plot highlights that Kuwait stands out with the highest figures, followed by Trinidad & Tobago and Turkmenistan. This visual representation confirms that while developing nations are generally associated with lower emissions, there are exceptions where emissions are quite substantial.

Code
##### BAR CHART PER CAPITA FOR NON-DEVELOPED COUNTRY

# Calculate the cumulative CO2 emissions per capita for each country
nondev_df_cumulative_co2 <- nondev_df %>%
  group_by(country) %>%
  summarize(
    co2_cumulative = sum(CO2_emissions, na.rm = TRUE)  
  ) %>%
  ungroup() %>%
  arrange(desc(co2_cumulative)) %>%
  top_n(10, co2_cumulative)  # Top 10 countries with the highest total CO2 emissions

# Create the bar chart for cumulative CO2 emissions per capita
p2 <- ggplot(nondev_df_cumulative_co2, aes(x = reorder(country, co2_cumulative), y = co2_cumulative, fill = co2_cumulative)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Cumulative CO2\nEmissions per capita") +
  labs(
    title = "Developing countries with the highest cumulative CO2 emissions per capita",
    x = "Country",
    y = "Cumulative CO2 Emissions per capita"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10))

Note : This plot presents the top 10 developing countries with the highest historical cumulative CO2 emissions per capita.

Now shifting our analysis to total emissions, this plot illustrates which developed countries are the largest total producers of CO2. The graph showcases the US and China as the predominant emitters, with their total output far exceeding that of other nations on the list. The substantial difference in emissions between these two countries and the rest highlights the significant role they play in global CO2 production.

Code
### BAR CHART TOTAL FOR DEVELOPED COUNTRIES

# Calculate the total CO2 emissions for each country by multiplying per capita emissions with population
dev_df_total_co2 <- developed_df %>%
  mutate(total_CO2_emissions = CO2_emissions * Population) %>%
  group_by(country) %>%
  summarize(
    co2_total = sum(total_CO2_emissions, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  arrange(desc(co2_total)) %>%
  top_n(10, co2_total) # top_n with 10 to get the top 10

# Create the bar chart for total CO2 emissions with gradient fill
p3 <- ggplot(dev_df_total_co2, aes(x = reorder(country, co2_total), y = co2_total, fill = co2_total)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Total cumulative \nCO2 Emissions") +
  labs(
    title = "Developed countries with the highest cumulative total CO2 Emissions",
    x = "Country",
    y = "Total CO2 Emissions (tons)"
  ) +
  theme_minimal() +
  theme(    
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10))

Note : This plot presents the top 10 developed countries with the highest historical cumulative total CO2 emissions.

Regarding the total cumulative CO2 emissions of developing nations, the following bar chart underlines the dominant emissions from India and Russia, suggesting the intensity of their economic activities and energy use patterns. The visible gap in emissions across these countries highlight the varied stages of industrial growth and energy infrastructure among these countries.

Code
#### BAR CHART TOTAL FOR NON-DEVELOPED COUNTRIES

# Calculate the total CO2 emissions for each country by multiplying per capita emissions with population
nondev_df_total_co2 <- nondev_df %>%
  mutate(total_CO2_emissions = CO2_emissions * Population) %>%
  group_by(country) %>%
  summarize(
    co2_total = sum(total_CO2_emissions, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  arrange(desc(co2_total)) %>%
  top_n(10, co2_total) # top_n with 10 to get the top 10

# Create the bar chart for total CO2 emissions with gradient fill
p4 <- ggplot(nondev_df_total_co2, aes(x = reorder(country, co2_total), y = co2_total, fill = co2_total)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Total cumulative \nCO2 Emissions") +
  labs(
    title = "Developing countries with the highest cumulative total CO2 emissions",
    x = "Country",
    y = "Total CO2 Emissions (tons)"
  ) +
  theme_minimal() +
  theme(    
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10)
    )

Note : This plot presents the top 10 developing countries with the highest historical cumulative total CO2 emissions.

3.3.2 Are We Reducing CO2 Emissions?

Its all well and good expanding our renewable energy potential, but what is the outlook on the global carbon emissions? A discernible divergence is apparent between developed and undeveloped nations. Developed countries are witnessing a downward trend in per capita CO2 emissions, potentially indicative of their concerted efforts towards sustainable and eco-friendly practices. Conversely, non-developed nations are experiencing an upward trajectory in per capita CO2 emissions. This surge could be attributed to industrialization, increased consumption, and or limited access to clean energy technologies.

Code
### CO2 Emissions Per Capita Change Line Plot

# Calculate the yearly mean for developed nations
developed_mean <- developed_df %>%
  group_by(year) %>%
  summarize(mean_CO2 = mean(CO2_emissions))

# Calculate the yearly mean for undeveloped nations
nondev_mean <- nondev_df %>%
  group_by(year) %>%
  summarize(mean_CO2 = mean(CO2_emissions))

# Combine the mean data frames
mean_df <- bind_rows(
  mutate(developed_mean, category = "Developed"),
  mutate(nondev_mean, category = "Undeveloped")
)

# Plot for Developed and Undeveloped Nations with Yearly Mean
l1 <- ggplot(bind_rows(developed_df, nondev_df), aes(x = year, y = CO2_emissions, color = factor("developed"))) +
  geom_line(data = mean_df, aes(x = year, y = mean_CO2, color = category), size = 1) +
  labs(
    title = "CO2 emissions per capita : developed vs undeveloped",
    x = "Year",
    y = "Per Capita CO2 Emissions (Tons)",
    color = "Category"
  ) +
  scale_color_manual(values = c("Developed" = "blue", "Undeveloped" = "red")) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

Note : The 1991 surge in CO2 emissions in undeveloped countries, shown in the chart, corresponds to the Kuwaiti oil fires (https://visibleearth.nasa.gov/images/78594/kuwait-oil-fires), not a data error.

Delving deeper into the individual country trajectories within the top 10 cumulative CO2 emitters per capita, we observe a trend that aligns with the earlier discussion. Developed nations, in general, demonstrate a pattern of either constancy or decline in emissions per capita. This is particularly noticeable for countries such as the United Arab Emirates and Luxembourg, which exhibit a marked decrease. Similarly, Qatar shows a pronounced reduction in emissions per capita since the year 2000. This trend could be explained by the maturation of industrialization in developed countries, where growth in heavy industries has plateaued and advancements in technology have led to more efficient processes. The advancements in technology and a transition to less carbon-intensive operations might be one of the reasons of this consistent and sometimes declining pattern in CO2 emissions per capita for developed countries.

Code
### LINE PLOT PER CAPITA FOR DEVELOPED COUNTRY

# Exclude the year 2022 from the dataset as we don't have that year for CO2 emissions
dev_df_filtered <- developed_df %>%
  filter(!(year == 2022))

# Calculate the annual total CO2 emissions for each country
dev_df_total_CO2 <- dev_df_filtered %>%
  group_by(country, year) %>%
  summarize(total_CO2_emissions = sum(CO2_emissions, na.rm = TRUE)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total CO2 emissions
top_10_countries_CO2 <- dev_df_total_CO2 %>%
  group_by(country) %>%
  summarize(total_CO2_emissions = sum(total_CO2_emissions, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_CO2_emissions)) %>%
  slice_head(n = 10) %>% #### slice_head for the 10 highest and slice_tail for the 10 lowest
  pull(country)

filtered_data_CO2 <- dev_df_total_CO2 %>%
  filter(country %in% top_10_countries_CO2)

# Create an interactive line plot with hover labels for CO2 emissions
plot_CO2 <- ggplot(filtered_data_CO2, aes(x = year, y = total_CO2_emissions, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of CO2 emissions per capita for developed countries",
    x = "Year",
    y = "CO2 Emissions per capita",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.6, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

# Convert the ggplot to a plotly object for interactivity
plotly_CO2_devcapita <- ggplotly(plot_CO2, width = 700, height = 500, tooltip = "text")

Note : This plot presents the per capita CO2 emissions change of the top 10 developed countries with the highest emission rates. Data on CO2 emissions is available up to the year 2021.

In contrast to developed countries, the trend in CO2 emissions per capita among non-developed countries typically shows an increasing pattern, as illustrated by the following plot. Kuwait is an exception, with a notable decrease leading up to 1980. Following this period, Kuwait’s emissions per capita align with the general trend of other non-developed countries, exhibiting a slight increase. The trajectories of the remaining countries underscore a consistent slight climb in emissions over the years. The persistent upward trend in emissions among these nations is indicative of their developmental trajectories, which often involve scaling up industrial activities, energy production, and urbanization. These processes, central to economic growth, tend to increase the demand for energy, frequently met through carbon-intensive sources that contribute to higher per capita CO2 emissions.

Code
### LINE PLOT PER CAPITA FOR NON-DEVELOPED COUNTRY

# Exclude the year 2022 from the dataset as we don't have that year for CO2 emissions
nondev_df_filtered <- nondev_df %>%
  filter(!(year == 2022), !(country == "Kuwait" & year == 1991)) 
# purposely remove 1991 for kuwait because of the oil fire anomaly

# Calculate the annual total CO2 emissions for each country
nondev_df_total_CO2 <- nondev_df_filtered %>%
  group_by(country, year) %>%
  summarize(total_CO2_emissions = sum(CO2_emissions, na.rm = TRUE)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total CO2 emissions
top_10_countries_CO2 <- nondev_df_total_CO2 %>%
  group_by(country) %>%
  summarize(total_CO2_emissions = sum(total_CO2_emissions, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_CO2_emissions)) %>%
  slice_head(n = 10) %>% #### slice_head for the 10 highest and slice_tail for the 10 lowest
  pull(country)

filtered_data_CO2 <- nondev_df_total_CO2 %>%
  filter(country %in% top_10_countries_CO2)

# Create an interactive line plot with hover labels for CO2 emissions
plot_CO2 <- ggplot(filtered_data_CO2, aes(x = year, y = total_CO2_emissions, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of CO2 emissions per capita for developing countries",
    x = "Year",
    y = "CO2 Emissions per capita",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.6, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

# Convert the ggplot to a plotly object for interactivity
plotly_CO2_nondevcapita <- ggplotly(plot_CO2, width = 700, height = 500, tooltip = "text")

Note : This plot presents the per capita CO2 emissions change of the top 10 developing countries with the highest emission rates.

However the contrasting trajectories from the per capita emissions, don’t tell the whole story. Below we can see that in taking the total emissions, both developed and undeveloped countries for much of the last 60 years have followed similar increasing trends.

Code
### CO2 Emissions Total Change Line Plot

# Converting CO2 emissions per capita in total values
developed_df1 <- developed_df %>%
  mutate(product = Population * CO2_emissions)

nondev_df1 <- nondev_df %>%
  mutate(product = Population * CO2_emissions)

# Calculate the annual total for the product in developed nations
developed_total <- developed_df1 %>%
  group_by(year) %>%
  summarize(total_product = sum(product))

# Calculate the annual total for the product in undeveloped nations
nondev_total <- nondev_df1 %>%
  group_by(year) %>%
  summarize(total_product = sum(product))

# Combine the total data frames
total_product_df <- bind_rows(
  mutate(developed_total, category = "Developed"),
  mutate(nondev_total, category = "Undeveloped")
)

# Plot for Developed and Undeveloped Nations with Yearly Total (Product)
l2 <- ggplot(bind_rows(developed_df, nondev_df), aes(x = year, y = product, color = factor("developed"))) +
  geom_line(data = total_product_df, aes(x = year, y = total_product, color = category), size = 1) +
  labs(
    title = "Total CO2 Emissions : developed vs undeveloped",
    x = "Year",
    y = "Total CO2 Emissions",
    color = "Category"
  ) +
  scale_color_manual(values = c("Developed" = "blue", "Undeveloped" = "red")) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

Taking a closer look once again into the country-level details for a more precise understanding, the first graph reveals that the uptrend in total CO2 emissions among developed countries has been predominantly driven by the United States and particularly China since 1980. This increase in emissions from the United States and China reflects their expansive industrial activities, driven by significant economic growth and a large consumer base compare to other developed countries. Therefore, the emissions from China and the US are substantially higher than those of other developed countries, making it difficult to discern the individual impact and trends of other nations within the overall upward trajectory of developed countries’ emissions.

Code
### TOTAL LINE PLOT FOR DEVELOPED WITH CHINA AND US

# Exclude the year 2022 from the dataset as we don't have that year for CO2 emissions
dev_df_filtered <- developed_df %>%
  filter(!(year == 2022))

# Calculate the annual total CO2 emissions for each country
dev_df_total_CO2 <- dev_df_filtered %>%
  group_by(country, year) %>%
  summarize(total_CO2_emissions = sum(CO2_emissions*Population, na.rm = TRUE)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total CO2 emissions
top_10_countries_CO2 <- dev_df_total_CO2 %>%
  group_by(country) %>%
  summarize(total_CO2_emissions = sum(total_CO2_emissions, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_CO2_emissions)) %>%
  slice_head(n = 10) %>% #### slice_head for the 10 highest and slice_tail for the 10 lowest
  pull(country)

filtered_data_CO2 <- dev_df_total_CO2 %>%
  filter(country %in% top_10_countries_CO2)

# Create an interactive line plot with hover labels for CO2 emissions
plot_CO2 <- ggplot(filtered_data_CO2, aes(x = year, y = total_CO2_emissions, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of total CO2 emissions for developed countries",
    x = "Year",
    y = "Total CO2 Emissions",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_CO2_devtotal1 <- ggplotly(plot_CO2, width = 700, height = 500, tooltip = "text")

Note : This plot presents the total CO2 emissions change of the top 10 developed countries with the highest emission rates.

By setting aside the data from the US and China, this graph allows us to identify more specific trends among the remaining developed countries. Initially, there’s an observable increase in CO2 emissions across all these nations until the 1980s. Following the 1980s, a split in trends becomes apparent. Indeed we can observe that Germany, France, Italy, and Poland either stabilize or see a decline in their emissions. On the other hand, other developed countries persist with an upward trajectory in CO2 emissions. This trend is interrupted in 2020, with a steep decline across the board, which can be attributed to the economic disruptions caused by the COVID-19 pandemic, leading to reduced industrial operations and travel. However, as the world began to recover economically post-2020, there’s a significant resurgence in CO2 emissions among all developed countries, suggesting a rebound in industrial activities and transportation.

This individual country-level analysis is important as it highlights that while some nations are managing to reduce their total CO2 emissions, their efforts are offset on a global scale by others that maintain a high increase in CO2 emissions.

Code
### TOTAL LINE PLOT FOR DEVELOPED WITHOUT CHINA AND US

# Exclude the year 2022 from the dataset as we don't have that year for CO2 emissions and china and US to have better visualization
dev_df_filtered <- developed_df %>%
  filter(!(year == 2022), !(country == "China"), !(country == "US")) 

# Calculate the annual total CO2 emissions for each country
dev_df_total_CO2 <- dev_df_filtered %>%
  group_by(country, year) %>%
  summarize(total_CO2_emissions = sum(CO2_emissions*Population, na.rm = TRUE)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total CO2 emissions
top_10_countries_CO2 <- dev_df_total_CO2 %>%
  group_by(country) %>%
  summarize(total_CO2_emissions = sum(total_CO2_emissions, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_CO2_emissions)) %>%
  slice_head(n = 10) %>% #### slice_head for the 10 highest and slice_tail for the 10 lowest
  pull(country)

filtered_data_CO2 <- dev_df_total_CO2 %>%
  filter(country %in% top_10_countries_CO2)

# Create an interactive line plot with hover labels for CO2 emissions
plot_CO2 <- ggplot(filtered_data_CO2, aes(x = year, y = total_CO2_emissions, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of total CO2 emissions for developed countries",
    x = "Year",
    y = "Total CO2 Emissions",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_CO2_devtotal2 <- ggplotly(plot_CO2, width = 700, height = 500, tooltip = "text")

Note : To distinguish more clearly the trend of developed countries with lower total CO2 emissions than China and US, this plot presents the top 10 developed countries with the largest total CO2 emissions without China and US.

Similar to developed nations, the following graph indicates that developing countries are experiencing an increasing trend in CO2 emissions. The rise is particularly notable in India, which, as one of the most populous nations with rapid economic growth, contributes significantly to this trend. In contrast, Russia, Ukraine, and Kazakhstan display a decrease in emissions, likely a result of the economic and industrial challenges they faced following the dissolution of the USSR.

Code
#### TOTAL LINEPLOT FOR NON-DEVELOPED 

# Exclude the year 2022 from the dataset as we don't have that year for CO2 emissions
nondev_df_filtered <- nondev_df %>%
  filter(!(year == 2022), !(country == "Kuwait" & year == 1991)) 
# purposely remove 1991 kuwait because of the oil fire anomaly  

# Calculate the annual total CO2 emissions for each country
nondev_df_total_CO2 <- nondev_df_filtered %>%
  group_by(country, year) %>%
  summarize(total_CO2_emissions = sum(CO2_emissions*Population, na.rm = TRUE)) %>%
  ungroup()

# Filter the data to include only the top 10 countries with the highest annual total CO2 emissions
top_10_countries_CO2 <- nondev_df_total_CO2 %>%
  group_by(country) %>%
  summarize(total_CO2_emissions = sum(total_CO2_emissions, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_CO2_emissions)) %>%
  slice_head(n = 10) %>% #### slice_head for the 10 highest and slice_tail for the 10 lowest
  pull(country)

filtered_data_CO2 <- nondev_df_total_CO2 %>%
  filter(country %in% top_10_countries_CO2)

# Create an interactive line plot with hover labels for CO2 emissions
plot_CO2 <- ggplot(filtered_data_CO2, aes(x = year, y = total_CO2_emissions, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of total CO2 emissions for developing countries",
    x = "Year",
    y = "Total CO2 Emissions",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    axis.title.y = element_text(size = 10),
    axis.title.x = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_CO2_nondevtotal <- ggplotly(plot_CO2, width = 700, height = 500, tooltip = "text")

Note : This plot presents the total CO2 emissions change of the top 10 developing countries with the highest emission rates. Data for Russia, Ukraine, and Kazakhstan is available from 1984 onwards, as they were previously constituents of the USSR.

These analysis suggests that despite a decreasing trend in CO2 emissions per capita in some developed countries, global total CO2 emissions continue to rise. This increase is largely driven by the significant economic and population growth in countries like the US, China and India but also by other developing countries. Although individual nations such as Germany, France, Italy, and Poland have shown reductions or stabilization in total CO2 emissions, these positive changes are overshadowed by increases elsewhere. This overall trajectory indicates that, without substantial and coordinated global efforts to curb emissions across both developed and developing countries, the trend of rising total emissions is likely to persist.

3.4 EDA Specifc to PM2.5 exposure

3.4.1 Cumulative particule matter 2.5 exposure ranking

In line with the methodology applied to CO2 emissions, our focus now shifts to the PM2.5 exposure across the globe. By examining the top 10 developed and developing countries with the highest cumulative PM exposure, we aim to pinpoint where the need for air quality improvement is most pressing and to observe where the most significant changes over time can be discerned.

The following bar plot illustrates the top 10 developed countries with the highest cumulative PM exposure, clearly showing Qatar leading substantially, with Saudi Arabia, United Arab Emirates and China also exhibiting high levels. The graph also highlights the significant disparities in PM exposure within these nations, with countries like Slovenia, Slovakia, and Israel displaying considerably lower levels in comparison. This underscores the fact that even within developed nations, there can be considerable disparities in air quality.

Code
### BAR CHART FOR DEVELOPED COUNTRIES FOR PM

# Calculate the cumulative PM exposure for each country
dev_df_cumulative_pm <- rawdev_df %>%
  group_by(country) %>%
  summarize(
    pm_total = sum(PM_exposure, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  arrange(desc(pm_total)) %>%
  top_n(10, pm_total) # Use top_n with 10 to get the top 10

# Reshape the data to long format for plotting
dev_df_cumulative_pm_long <- dev_df_cumulative_pm %>%
  pivot_longer(
    cols = pm_total,
    names_to = "source",
    values_to = "cumulative_total"
  )

# Create the bar chart for PM exposure
pm1 <- ggplot(dev_df_cumulative_pm_long, aes(x = reorder(country, cumulative_total), y = cumulative_total, fill = cumulative_total)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Total PM\nExposure") +
  labs(
    title = "Developed countries with the highest cumulative PM2.5 exposure",
    x = "Country",
    y = "Cumulative PM Exposure",
    fill = "PM Source"
  ) +
  theme_minimal() +
  theme(    
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10)
    )

Note : This plot presents the top 10 developed countries with the highest historical cumulative PM2.5 exposure.

As for developing nations, India stands out with notably high levels, reflecting the substantial air quality challenges that accompany rapid industrialization and urbanization in populous nations. Close behind, Egypt, Kuwait, Pakistan and Bangladesh also register significant PM exposure, underscoring the environmental pressures of economic and infrastructural development. In contrast to the varied PM exposure levels observed in developed countries, this plot reveals a more consistent pattern of high PM exposure across the board for developing nations, highlighting the challenge of air pollution for these countries.

Code
### BAR CHART FOR NON-DEVELOPED COUNTRIES FOR PM

# Calculate the cumulative PM exposure for each country
nondev_df_cumulative_pm <- rawnondev_df %>%
  group_by(country) %>%
  summarize(
    pm_total = sum(PM_exposure, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  arrange(desc(pm_total)) %>%
  top_n(10, pm_total) # Use top_n with 10 to get the top 10

# Reshape the data to long format for plotting
nondev_df_cumulative_pm_long <- nondev_df_cumulative_pm %>%
  pivot_longer(
    cols = pm_total,
    names_to = "source",
    values_to = "cumulative_total"
  )

# Create the bar chart for PM exposure
pm2 <- ggplot(nondev_df_cumulative_pm_long, aes(x = reorder(country, cumulative_total), y = cumulative_total, fill = cumulative_total)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "red", name = "Total PM\nExposure") +
  labs(
    title = "Developing countries with the highest cumulative PM2.5 exposure",
    x = "Country",
    y = "Cumulative PM Exposure",
    fill = "PM Source"
  ) +
  theme_minimal() +
  theme(    
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    axis.title.x = element_text(size = 8),
    axis.title.y = element_text(size = 8),
    legend.text = element_text(size = 7),
    legend.title = element_text(size = 8),
    plot.title = element_text(hjust = 0.5, size = 10)
    )

Note : This plot presents the top 10 developing countries with the highest historical cumulative PM2.5 exposure.

3.4.2 Better News on PM 2.5?

The following graph presents a comparative trend of average PM2.5 exposure between developed and undeveloped nations from 1990 to 2019. As we can see, developed countries experienced a slight decline from 1990 to 2010 and a substantial fall after 2010, indicating a significant reduction in PM2.5 exposure over time. This suggests effective measures have been implemented to improve air quality, potentially including stricter regulations, cleaner technologies, and increased public awareness in these countries.

In contrast, the red line represents undeveloped countries, showing a less pronounced decline over the past 30 years. The flatter trend suggests that while there may have been some improvements, especially around 2010, the reduction in PM2.5 exposure is not substantial enough to reach the exposure level of developed nations. This could be due to a variety of factors including slower implementation of air quality regulations, ongoing industrialization, and limited resources for environmental management.

The contrast between the two lines, not only in terms of changes but also in absolute values, highlights the disparities in air quality management and public health outcomes between developed and undeveloped nations. It underscores the need for increased efforts to reduce air pollution in undeveloped countries to improve health and environmental conditions. The data suggests that while there is a global trend towards better air quality, the progress is uneven and more attention may be needed in undeveloped regions to achieve similar outcomes as seen in developed countries.

Code
### LINE PLOT FOR PM DEV VS UNDEV

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data for developed countries
developed_df_filtered <- rawdev_df %>%
  filter(year %in% years_with_data) %>%
  mutate(product = PM_exposure)

# Do the same for non-developed countries
nondev_df_filtered <- rawnondev_df %>%
  filter(year %in% years_with_data) %>%
  mutate(product = PM_exposure)

# Calculate the annual total PM2.5 exposure product for developed nations
developed_total <- developed_df_filtered %>%
  group_by(year) %>%
  summarize(total_product = sum(product, na.rm = TRUE)) %>%
  mutate(category = "Developed")

# Calculate the annual total PM2.5 exposure product for undeveloped nations
nondev_total <- nondev_df_filtered %>%
  group_by(year) %>%
  summarize(total_product = sum(product, na.rm = TRUE)) %>%
  mutate(category = "Undeveloped")

# Combine the total data frames
total_product_df <- bind_rows(developed_total, nondev_total)

# Plot for Developed and Undeveloped Nations with Yearly Total PM2.5 Exposure (Product)
plot_devpm <- ggplot(total_product_df, aes(x = year, y = total_product, color = category, group = category)) +
  geom_line(size = 1) +
  labs(
    title = "Total PM2.5 Exposure : developed vs undeveloped",
    x = "Year",
    y = "Total PM2.5 Exposure",
    color = "Category"
  ) +
  scale_color_manual(values = c("Developed" = "blue", "Undeveloped" = "red")) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
  )

Note : Data for PM2.5 exposure are available for the years 1990, 1995, 2000, 2005, and annually from 2010 to 2019.

Once again, let’s take a closer look to identify the trend in PM2.5 exposure between developed and developing nations at the country level. This line graph illustrates the trajectory of PM exposure for the top 10 developed countries with the highest cumulative exposure which reveals a complex and varied landscape. Since 1990, the trends do not follow a uniform pattern; for instance, Saudi Arabia, Qatar, United Arab Emirates and China haven’t experienced any significant decrease in PM2.5 exposure over time.

Code
### LINE PLOT FOR DEVELOPED COUNTRIES FOR HIGH PM

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data
dev_df_filtered <- rawdev_df %>%
  filter((year %in% years_with_data))

# Calculate cumulative PM exposure for each country to select the 10 lowest
top_10_countries_pm <- dev_df_filtered %>%
  group_by(country) %>%
  summarize(cumulative_pm_exposure = sum(PM_exposure, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(cumulative_pm_exposure)) %>% # To have descending order
  slice_head(n = 10) %>% # To have the highest exposure
  pull(country)

# Keep only the top 10 countries with the lowest cumulative PM exposure
filtered_data_pm <- dev_df_filtered %>%
  filter(country %in% top_10_countries_pm)

# Create an interactive line plot with hover labels for PM exposure
plot_pm <- ggplot(filtered_data_pm, aes(x = year, y = PM_exposure, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of highest PM2.5 exposure level for developed countries",
    x = "Year",
    y = "Annual PM Exposure",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_devpm1 <- ggplotly(plot_pm, width = 700, height = 500, tooltip = "text")

Note : This plot presents the PM2.5 exposure change of the top 10 developed countries with the highest exposure.

On closer examination, excluding Saudi Arabia, Qatar, the United Arab Emirates, and China, we can discern a more substantial decline in exposure levels for developed nations with high exposure except for South Korea and Singapore. This suggests that, while these countries have made notable progress, others have experienced minimal improvement or encountered difficulties in reducing PM2.5 pollution.

Code
### LINE PLOT FOR DEVELOPED COUNTRIES FOR HIGH PM WITHOUT TOO HIGH VALUES

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data and countries with too high values for better visualisation
dev_df_filtered <- rawdev_df %>%
  filter((year %in% years_with_data) & !(country %in% c("China", "Qatar", "Saudi Arabia", "United Arab Emirates")))

# Calculate cumulative PM exposure for each country to select the 10 lowest
top_10_countries_pm <- dev_df_filtered %>%
  group_by(country) %>%
  summarize(cumulative_pm_exposure = sum(PM_exposure, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(cumulative_pm_exposure)) %>% # To have descending order
  slice_head(n = 10) %>% # To have the highest exposure
  pull(country)

# Keep only the top 10 countries with the lowest cumulative PM exposure
filtered_data_pm <- dev_df_filtered %>%
  filter(country %in% top_10_countries_pm)

# Create an interactive line plot with hover labels for PM exposure
plot_pm <- ggplot(filtered_data_pm, aes(x = year, y = PM_exposure, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of highest PM2.5 exposure level for developed countries",
    x = "Year",
    y = "Annual PM Exposure",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_devpm2 <- ggplotly(plot_pm, width = 700, height = 500, tooltip = "text")

Note : In this plot, we remove the observation of China, Qatar, Saudi Arabia and United Arab Emirates as it was difficult too highlight a trend for the countries with lower values.

What about developed countries with lower PM2.5 exposure level? The following plot suggests that every nation with relatively low PM2.5 exposure levels has experienced significant reductions over time. The consistent decline across these countries reinforces the idea that effective PM reduction strategies are in place and can yield positive results for developed nations with lower exposure.

Code
### LINE PLOT FOR DEVELOPED COUNTRIES FOR LOW PM

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data
dev_df_filtered <- rawdev_df %>%
  filter(year %in% years_with_data)

# Calculate cumulative PM exposure for each country to select the 10 lowest
top_10_countries_pm <- dev_df_filtered %>%
  group_by(country) %>%
  summarize(cumulative_pm_exposure = sum(PM_exposure, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(cumulative_pm_exposure)) %>% # To have descending order
  slice_tail(n = 10) %>% # To have the highest exposure
  pull(country)

# Keep only the top 10 countries with the lowest cumulative PM exposure
filtered_data_pm <- dev_df_filtered %>%
  filter(country %in% top_10_countries_pm)

# Create an interactive line plot with hover labels for PM exposure
plot_pm <- ggplot(filtered_data_pm, aes(x = year, y = PM_exposure, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of lowest PM2.5 exposure level for developed countries",
    x = "Year",
    y = "Annual PM Exposure",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_devpm3 <- ggplotly(plot_pm, width = 700, height = 500, tooltip = "text")

Note : This plot presents the PM2.5 exposure change of the top 10 developed countries with the lowest exposure.

In the context of developing nations, we observe the same phenomenon as in developed countries when analyzing PM2.5 exposure, but it appears more pronounced. Indeed, all the top 10 developing countries with the highest exposure do not show any significant decrease over time. The following plot reveals a pattern of minor increases or stabilization rather than a clear decline. This pattern contrasts with the overall reduction trend in PM exposure that has been observed globally for developing nations, although slight.

Code
### LINE PLOT FOR NON-DEVELOPED COUNTRIES FOR HIGH PM

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data
nondev_df_filtered <- rawnondev_df %>%
  filter(year %in% years_with_data)

# Calculate cumulative PM exposure for each country to select the 10 lowest
top_10_countries_pm <- nondev_df_filtered %>%
  group_by(country) %>%
  summarize(cumulative_pm_exposure = sum(PM_exposure, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(cumulative_pm_exposure)) %>% # To have descending order
  slice_head(n = 10) %>% # To have the highest exposure
  pull(country)

# Keep only the top 10 countries with the lowest cumulative PM exposure
filtered_data_pm <- nondev_df_filtered %>%
  filter(country %in% top_10_countries_pm)

# Create an interactive line plot with hover labels for PM exposure
plot_pm <- ggplot(filtered_data_pm, aes(x = year, y = PM_exposure, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of highest PM2.5 exposure level for developing countries",
    x = "Year",
    y = "Annual PM Exposure",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_nondevpm1 <- ggplotly(plot_pm, width = 700, height = 500, tooltip = "text")

Note : This plot presents the PM2.5 exposure change of the top 10 developing countries with the highest exposure.

So, which developing countries are responsible for the reduction trend in PM2.5 exposure ? To address this question, we must analyze developing nations with lower exposure levels. The graph for developing countries with the lowest levels of PM exposure demonstrates a substantial decrease. This indicates that, much like developed countries, but more markedly, developing nations starting with lower levels of PM2.5 have been more successful in implementing effective air quality management strategies than developing nations with higher exposure, leading to more significant improvements over time.

Code
### LINE PLOT FOR NON-DEVELOPED COUNTRIES FOR LOW PM

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Filter for only the years with PM exposure data
nondev_df_filtered <- rawnondev_df %>%
  filter(year %in% years_with_data)

# Calculate cumulative PM exposure for each country to select the 10 lowest
top_10_countries_pm <- nondev_df_filtered %>%
  group_by(country) %>%
  summarize(cumulative_pm_exposure = sum(PM_exposure, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(cumulative_pm_exposure)) %>% # To have descending order
  slice_tail(n = 10) %>% # To have the highest exposure
  pull(country)

# Keep only the top 10 countries with the lowest cumulative PM exposure
filtered_data_pm <- nondev_df_filtered %>%
  filter(country %in% top_10_countries_pm)

# Create an interactive line plot with hover labels for PM exposure
plot_pm <- ggplot(filtered_data_pm, aes(x = year, y = PM_exposure, group = country, color = country, text = country)) +
  geom_line() +
  labs(
    title = "Annual change of lowest PM2.5 exposure level for developing countries",
    x = "Year",
    y = "Annual PM Exposure",
    color = "Country"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 11),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

# Convert the ggplot to a plotly object for interactivity
plotly_nondevpm2 <- ggplotly(plot_pm, width = 700, height = 500, tooltip = "text")

Note : This plot presents the PM2.5 exposure change of the top 10 developing countries with the lowest exposure.

Concluding our analysis of PM exposure levels, we must acknowledge the complexity in drawing health-related conclusions from our dataset. The figures represent an annual average across entire countries, which may obscure localized PM concentrations that are typically higher in urban areas. Therefore, it’s impractical to declare specific national levels as universally dangerous without considering regional variations and city-level data.

Contrasting PM exposure with CO2 emissions reveals a somewhat optimistic picture. Unlike the consistent rise in CO2 emissions, PM exposure levels are declining in both developed and developing countries, indicating progressive strides in air quality management. Despite this positive trend, it’s clear that developing countries still have significant room for improvement to achieve the air quality standards seen in more developed nations.

3.5 The 2019 Overview: Renewable Energies, Emissions, and PM2.5 Exposure

After analyzing the trends for renewable energy generation, CO2 emissions, and PM2.5 exposure, the following interactive map provides a snapshot of the global situation in 2019. This visual tool allows us to explore the interplay between renewable energy generation and the environmental indicators of CO2 emissions and PM2.5 exposure levels.

This interactive map underscores that populous nations such as China, the United States, India, and Russia show high level of CO2 emissions compared to less populated countries. However, PM2.5 exposure levels paint a nuanced landscape, reflecting the multifaceted nature of air quality issues. The United States and China, despite their high CO2 emissions, report lower levels of PM2.5 exposure, possibly indicative of effective pollution management, advanced emission control technologies or the country size effect. In contrast, India and Saudi Arabia, along with Iran and South Korea for instance to a lesser degree, exhibit both substantial CO2 emissions and PM2.5 exposure. Additionally, in the Middle East and Eastern Europe, the PM2.5 exposure levels are notably high relative to their total CO2 emissions, whereas western European countries such as France, Italia or the UK, present an inverse scenario, with lower PM2.5 exposure against their CO2 emissions footprint. This disparity suggests that factors beyond population density, such as regional industrial activities, air control policy, and energy production methods, might significantly influenced these metrics.

In concluding the analysis of the interactive map, it is evident that while countries like China and the United States exhibit substantial energy production across all five renewable sources, there exists a marked variation among other countries. For instance, nations such as France and Russia are prominent in nuclear energy generation but lag in harnessing other renewable forms. Similarly, Germany, India, and Brazil prominently utilize geothermal resources, yet their exploitation of alternative renewable sources remains limited. This pattern suggests that beyond the United States and China, countries often do not fully capitalize on every renewable energy opportunity, likely due to the interplay of multiple influencing factors such as infrastructure capabilities, climatic conditions, political will, and economic policies.

Ultimately, the interactive map does not elucidate a straightforward correlation between CO2 emissions, PM2.5 exposure, and renewable energy generation. While it serves as a valuable tool for contextualizing the current landscape of environmental indicators, it also highlights the absence of a uniform global pattern, reinforcing the need for localized insights to understand and address the nuances of air quality and emissions.

Code
### INTERACTIVE MAP

filtered_data_raw <- raw_df %>% filter(year == 2019) %>%
mutate(CO2_emissions = (CO2_emissions * Population)/100000000) # Reduce because of lack of visibility due to high values but the proportion between countries remains the same

updatemenus <- list(
  list(
    active = 0,
    x = 0.2,
    y = 0.99,
    buttons = list(
      list(
        label = "Solar",
        method = "update",
        args = list(list(visible = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)))
      ),
      list(
        label = "Wind",
        method = "update",
        args = list(list(visible = c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE)))
      ),
      list(
        label = "Hydro",
        method = "update",
        args = list(list(visible = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)))
      ),
      list(
        label = "Nuclear",
        method = "update",
        args = list(list(visible = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)))
      ),
      list(
        label = "Geo",
        method = "update",
        args = list(list(visible = c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)))
      )
    )
  )
)

### TOTAL

int <- plot_geo(filtered_data_raw, locationmode = 'country names') %>%
  add_trace(
    z = ~solar_generation,
    locations = ~country,
    color = ~solar_generation,
    colors = 'Reds',
    name = 'Solar',
    colorbar = list(title = "Solar Generation (TWh)", len = 1, y = 0.1, orientation = "h")
  ) %>%
  add_trace(
    z = ~wind_generation,
    locations = ~country,
    color = ~wind_generation,
    name = 'Wind',
    colorbar = list(title = "Wind Generation (TWh)", len = 1, y = 0.1, orientation = "h"),
    visible = FALSE
  ) %>%
  add_trace(
    z = ~hydro_generation,
    locations = ~country,
    color = ~hydro_generation,
    name = 'Hydro',
    colorbar = list(title = "Hydro Generation (TWh)", len = 1, y = 0.1, orientation = "h"),
    visible = FALSE
  ) %>%
  add_trace(
    z = ~nuclear_generation,
    locations = ~country,
    color = ~nuclear_generation,
    name = 'Nuclear',
    colorbar = list(title = "Nuclear Generation (TWh)", len = 1, y = 0.1, orientation = "h"),
    visible = FALSE
  ) %>%
  add_trace(
    z = ~geo_generation,
    locations = ~country,
    color = ~geo_generation,
    name = 'Geo',
    colorbar = list(title = "Geo Generation (TWh)", len = 1, y = 0.1, orientation = "h"),
    visible = FALSE
  ) %>%
  add_trace(
    type = 'scattergeo',  # Specify the trace type
    mode = 'markers',     # Use markers
    locations = ~country, # Specify countries as locations
    marker = list(
      size = ~CO2_emissions,      # Size of the markers based on CO2 emissions
      color = 'black',                 # Marker color
      sizemode = 'area',               # The size of the marker represents an area
      sizeref = 0.05,                   # Adjust this value to scale marker sizes
      line = list(color = "rgb(40,40,40)", width = 0.5)
    ),
    name = 'Total CO2 Emissions'
  ) %>%
  add_trace(
    type = 'scattergeo',  # Specify the trace type
    mode = 'markers',     # Use markers
    locations = ~country, # Specify countries as locations
    marker = list(
      size = ~PM_exposure,      # Size of the markers based on PM
      color = 'grey',                 # Marker color
      sizemode = 'area',               # The size of the marker represents an area
      sizeref = 0.5,                   # Adjust this value to scale marker sizes
      line = list(color = "rgb(40,40,40)", width = 0.5)
    ),
    name = 'PM Exposure'
  ) %>%
  layout(
    title = "2019 Overiew : Renewable Energy Generation, CO2 Emissions and PM2.5 Exposure",
    showlegend = TRUE,
    updatemenus = updatemenus,
    geo = list(
      showland = TRUE,
      landcolor = toRGB("gray95"),
      countrycolor = toRGB("gray80")
    ),
    annotations = list(
      list(
        text = "<b>Click to hide circle</b>",
        x = 1.04,
        y = 0.19,
        xref = "paper",
        yref = "paper",
        xanchor = 'left',
        yanchor = 'top',
        align = 'left',
        showarrow = FALSE,
        font = list(
          size = 12,
          color = "black"
        )
      )
    ),
    width = 800,  # Set the desired width
  height = 600,  # Set the desired height
  margin = list(
    t = 100  # Experiment with reducing the top margin
    )
  )

Note : This interactive map presents data for the year 2019, which represents the most recent year for which we have observations of PM2.5 exposure.

3.6 Relationship Between Renewable Energy Generation, CO2 emissions and PM2.5 exposure

The exploration thus far has provided a comprehensive overview of the status and trends in renewable energy generation, CO2 emissions, and PM2.5 exposure. However, the exploratory data analysis conducted does not allow us to conclude that changes in CO2 emissions and PM2.5 exposure are solely due to the rise in renewable energy use.. This observation underscores the complexity of these phenomena and the multifaceted interplay of various factors influencing them. Thereofre, to uncover potential underlying relationships or causations, a more in-depth analysis of these correlations is required.

3.6.1 Renewable Energies’ impact on CO2 emissions

Upon closer examination, the advantages of renewable energy, regrettably, don’t always align with visual impressions. The depicted graph underscores the intricate relationship between average total renewable energy generation and CO2 emissions. This positive relationships sheds light on the imperative need for decisive measures from the world’s major CO2 emitters, emphasizing that reducing emissions demands concerted efforts beyond merely introducing new renewable energy projects. This comparison suggests that merely increasing renewable energy generation is insufficient to offset the escalation of CO2 emissions resulting from economic growth, industrialization, transportation, and related factors.

Code
### TOTAL AVG RENEWABLE VS C02

# Calculate average renewable energy production across the years
avg_renewables <- raw_df %>%
  group_by(country) %>%
  summarize(
    avg_wind = mean(wind_generation, na.rm = TRUE),
    avg_solar = mean(solar_generation, na.rm = TRUE),
    avg_geo = mean(geo_generation, na.rm = TRUE),
    avg_hydro = mean(hydro_generation, na.rm = TRUE),
    avg_nuclear = mean(nuclear_generation, na.rm = TRUE)
  ) %>%
  mutate(
    avg_renewables = rowMeans(select(., starts_with("avg")), na.rm = TRUE)
  )

# Calculate average CO2 emissions across the years
avg_co2 <- raw_df %>%
  group_by(country) %>%
  summarize(avg_CO2 = mean(CO2_emissions * Population, na.rm = TRUE))

# Merge the two datasets
merged_data <- merge(avg_renewables, avg_co2, by = "country")

# Create a scatter plot with hover labels and trend line
avg1 <- ggplot(merged_data, aes(x = avg_CO2, y = avg_renewables, label = country)) +
  geom_point() +
  geom_text_repel(box.padding = 0.5, segment.size = 0.2) +  # Add labels on hover
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  # Add a linear regression line
  labs(
    title = "Total Average : Renewable Energy vs CO2 Emissions",
    x = "Average Total Renewable Energy Generated (TWh)", 
    y = "Average Total CO2 Emissions (Tons)"
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(size = 11),
    axis.text.y = element_text(size = 6),
    axis.text.x = element_text(size = 6),
    axis.title.y = element_text(size = 8),
    axis.title.x = element_text(size = 8),
    )

Yet, amidst the challenges, some nations stand out for their commendable efforts. Iceland and Norway, in particular, showcase noteworthy commitment, achieving high per capita renewable energy generation coupled with relatively low per capita CO2 emissions. Additionally, there are noteworthy positive outliers in the cases of Sweden and Switzerland, albeit to a lesser extent. These instances further underscore the potential impact of wealthy nations implementing robust renewable energy strategies as effective measures to mitigate carbon emissions. Nevertheless, these isolated instances do not provide enough evidence to draw any global-scale conclusions regarding the impact of renewable energy generation on CO2 emissions.

Code
### PER CAPITA AVG RENEWABLE VS C02


# Calculate average renewable energy production across the years
avg_renewablesall <- all_df %>%
  group_by(country) %>%
  summarize(
    avg_wind = mean(wind_generation, na.rm = TRUE),
    avg_solar = mean(solar_generation, na.rm = TRUE),
    avg_geo = mean(geo_generation, na.rm = TRUE),
    avg_hydro = mean(hydro_generation, na.rm = TRUE),
    avg_nuclear = mean(nuclear_generation, na.rm = TRUE)
  ) %>%
  mutate(
    avg_renewables = rowMeans(select(., starts_with("avg")), na.rm = TRUE)
  )

# Calculate average CO2 emissions across the years
avg_co2 <- all_df %>%
  group_by(country) %>%
  summarize(avg_CO2 = mean(CO2_emissions, na.rm = TRUE))

# Merge the two datasets
merged_data <- merge(avg_renewablesall, avg_co2, by = "country")

# Create a scatter plot with hover labels and trend line
avg2 <- ggplot(merged_data, aes(x = avg_renewables, y = avg_CO2, label = country)) +
  geom_point() +
  geom_text_repel(box.padding = 0.5, segment.size = 0.2, size = 3) +  # Add labels on hover
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  # Add a linear regression line
  labs(
    title = "Per Capita Average : Renewable Energy Generation vs CO2 Emissions",
    x = "Average Per Capita Renewable Energy Generation (TWh)",
    y = "Average Per Capita CO2 Emissions (Tons)"
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(size = 11),
    axis.text.y = element_text(size = 6),
    axis.text.x = element_text(size = 6),
    axis.title.y = element_text(size = 8),
    axis.title.x = element_text(size = 8),
    )

In the following graph, we observe a certain randomness in our data. Despite increasing renewable energy production year over year (seen by almost all of the points being above 0% on the y axis), there is no apparent trend in decreasing CO2 emissions(a roughly 50/50 spread of data either side of the 0% on the x axis). The intricacies of this relationship may be influenced by numerous factors, including the energy-intensive nature of building renewable infrastructure. As we are currently in the adoption phase, a substantial portion of the energy used for infrastructure development still originates from CO2-emitting sources. This complexity underscores the multifaceted nature of the data points and the evolving dynamics within the renewable energy landscape.

Code
### YEARLY CHANGE CO2 VS RENEW

# Calculate yearly rate of change of total renewable energy as a percentage
raw_df_rate_change <- raw_df %>%
  group_by(year) %>%
  summarize(
    total_renewable_energy = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation),
    CO2_emissions = sum(CO2_emissions)
  ) %>%
  mutate(rate_change_renewable = c(NA, (diff(total_renewable_energy) / lag(total_renewable_energy)) * 100)[-1]) %>%
  mutate(CO2_emissions_change = c(NA, (diff(CO2_emissions) / lag(CO2_emissions)) * 100)[-1])

# Create scatter plot with a line of best fit using ggplot
c <- ggplot(raw_df_rate_change, aes(x = CO2_emissions_change, y = rate_change_renewable, label = year)) +
  geom_point() +
  geom_text_repel(box.padding = 0.5, segment.size = 0.2) + # Add labels on hover
  geom_smooth(method = "lm", se = FALSE, color = "blue", size = 0.5) +
  labs(
    title = "Yearly change : Renewable energy generation vs CO2 emissions",
    x = "Yearly Rate of Change of CO2 Emissions (%)",
    y = "Yearly Rate of Change of Total Renewable Energy (%)",
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(size = 11),
    axis.text.y = element_text(size = 6),
    axis.text.x = element_text(size = 6),
    axis.title.y = element_text(size = 8),
    axis.title.x = element_text(size = 8),
    )

Upon closer examination at the country level, we can still identify instances in which an increase in renewable energy is accompanied by a decrease in CO2 emissions. However, even among the countries with the highest cumulative renewable energy generation per capita, it remains challenging to discern a significant overall trend. The plot comparing changes in CO2 emissions per capita and renewable energy generation per capita over time for these high-renewable-generation countries reveals a nuanced picture. Notably, Switzerland and France experienced a reduction in CO2 emissions alongside increased renewable energy generation, while Iceland, to a lesser extent, exhibited a notable slowdown in emissions growth despite a substantial increase in renewable energy. Nonetheless, for the majority of cases, a clear-cut relationship is elusive. Additionnaly, it’s essential to highlight that our analysis encompasses both high and low renewable energy generation countries, and even among the nations with the highest renewable generation, a definitive negative relationship between renewable energy generation and CO2 emissions is not readily apparent. This underscores the challenge, as for countries with lower renewable energy generation, the absence of a discernible relationship becomes even more pronounced.

Code
### FACET GRID PER COUNTRY RENEW VS CO2

# Filter out the year 2022 as we don't have that observation for CO2
all_df_filtered <- all_df %>%
  filter(year != 2022)

# Calculate total renewable energy production for each country and year
total_renewables <- all_df_filtered %>%
  group_by(country, year) %>%
  summarize(
    total_renewables = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation, na.rm = TRUE)
  ) %>%
  ungroup()

# Calculate CO2 emissions for each country and year
co2_emissions <- all_df_filtered %>%
  group_by(country, year) %>%
  summarize(CO2_emissions = sum(CO2_emissions, na.rm = TRUE)) %>%
  ungroup()

# Merge the total renewables data with the CO2 emissions data
merged_data <- merge(total_renewables, co2_emissions, by = c("country", "year"))

# Identify the top 10 countries with the highest total renewable energy production
top_countries <- merged_data %>%
  group_by(country) %>%
  summarize(total_renewables = sum(total_renewables, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_renewables)) %>%
  slice_head(n = 9) %>%
  pull(country)

# Filter the data for only these top countries
filtered_data <- merged_data %>%
  filter(country %in% top_countries)

# Normalize the data for CO2 and renewable energy to allow using a secondary y-axis
max_CO2 <- max(filtered_data$CO2_emissions, na.rm = TRUE)
max_renewables <- max(filtered_data$total_renewables, na.rm = TRUE)
filtered_data <- filtered_data %>%
  mutate(norm_CO2 = CO2_emissions / max_CO2,
         norm_renewables = total_renewables / max_renewables)

fg1 <- ggplot(filtered_data, aes(x = year)) +
  geom_line(aes(y = norm_renewables, group = country, color = country, linetype = "Renewable Energy")) +
  geom_line(aes(y = norm_CO2, group = country, color = country, linetype = "CO2 Emissions"), color = "grey") +
  facet_wrap(~ country, scales = 'free_x') + # Allows each country to have its own x-axis scale
  scale_y_continuous(
    "Normalized Renewable Energy Generation",
    sec.axis = sec_axis(~ ., name = "Normalized CO2 Emissions")
  ) +
  labs(
    title = "Per capita change : Renewable Energy Generation vs CO2 Emissions",
    x = "Year"
  ) +
  scale_color_manual(values = scales::hue_pal()(length(unique(filtered_data$country))),
                     breaks = unique(filtered_data$country),
                     labels = unique(filtered_data$country),
                     name = "Country") +
  scale_linetype_manual(values = c("Renewable Energy" = "solid", "CO2 Emissions" = "dashed"),
                        name = "Legend") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
    axis.text.y = element_text(size = 6),
    plot.title = element_text(hjust = 0.5, size = 10),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

Note : This plot presents individually the normalized renewable energy generation per capita change of the top 9 countries with the highest generation and their normalized CO2 emission per capita change.

Based on the comprehensive analysis of the relationship between renewable energy sources and CO2 emissions, it becomes evident that there is no discernible reduction in CO2 emissions directly attributable to the increased generation of renewable energy. These findings indicate that, at least for the present, it is not possible to definitively conclude that renewable energy generation significantly contributes to a reduction in CO2 emissions on a global scale. While renewable energy remains a crucial component of sustainable practices, the complex interplay of various factors affecting emissions necessitates a more nuanced understanding of the dynamics involved in achieving substantial reductions in CO2 emissions.

3.6.2 Renewable Energies’ impact on PM2.5 Exposure

What about the correlation between renewable energy generation and PM2.5 exposure? The associations between these two variables appear to hold more promise. Notably, a clear decreasing trend between these variables is observable, suggesting that an increase in renewable energy could potentially lead to a reduction in PM2.5 exposure. Nevertheless, it’s important to maintain perspective. The countries displaying the highest levels of renewable energy generation and at the same time lowest PM2.5 exposure, such as Sweden, Norway, Iceland, Canada and so on, primarily belong to occidental regions, characterized by superior infrastructure, effective air quality management, and well-defined energy policies. Therefore, while the trend regarding renewable energy generation and reduced PM2.5 exposure is encouraging, it’s essential to acknowledge that some aspects of this relationship might also be influenced by other factors that our data may not capture comprehensively.

Code
###  AVG RENEWABLE VS AVG PM2.5

# Calculate average renewable energy production across the years
avg_renewablesall <- all_df %>%
  group_by(country) %>%
  summarize(
    avg_wind = mean(wind_generation, na.rm = TRUE),
    avg_solar = mean(solar_generation, na.rm = TRUE),
    avg_geo = mean(geo_generation, na.rm = TRUE),
    avg_hydro = mean(hydro_generation, na.rm = TRUE),
    avg_nuclear = mean(nuclear_generation, na.rm = TRUE)
  ) %>%
  mutate(
    avg_renewables = rowMeans(select(., starts_with("avg")), na.rm = TRUE)
  )

# Calculate average PM exposure across the years
avg_pm <- raw_df %>%
  group_by(country) %>%
  summarize(avg_PM_exposure = mean(PM_exposure, na.rm = TRUE))

# Merge the two datasets
merged_data_pm <- merge(avg_renewablesall, avg_pm, by = "country")

# Create a scatter plot with hover labels and trend line
avg3 <- ggplot(merged_data_pm, aes(x = avg_renewables, y = avg_PM_exposure, label = country)) +
  geom_point() +
  geom_text_repel(aes(label = country), box.padding = 0.5, segment.size = 0.2, size = 3) +  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  # Add a linear regression line
  labs(
    title = "Average renewable energy per capita vs Average PM exposure",
    x = "Average Renewable Energy Per Capita",
    y = "Average PM Exposure"
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(size = 11),
    axis.text.y = element_text(size = 6),
    axis.text.x = element_text(size = 6),
    axis.title.y = element_text(size = 8),
    axis.title.x = element_text(size = 8),
    )

Examining the situation at the country level reveals a consistent pattern of decreasing PM exposure with high renewable energy generation. This trend becomes evident when we focus on countries with the highest levels of renewable energy generation, as illustrated in the subsequent plot. It strongly implies that in nations boasting substantial renewable energy production a reduction in PM exposure is consistently observed, as depicted in previous graph.

Code
### FACET GRID PER COUNTRY RENEW VS PM2.5 with high renew gen

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Calculate total renewable energy production for each country and year
total_renewables <- all_df %>%
  group_by(country, year) %>%
  summarize(
    total_renewables = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation, na.rm = TRUE)
  ) %>%
  ungroup()

# Filter out the years we don't have data for PM exposure
raw_df_filtered <- raw_df %>%
  filter(year %in% years_with_data)

# Calculate PM exposure for each country and year
pm_exposure <- raw_df_filtered %>%
  group_by(country, year) %>%
  summarize(PM_exposure = mean(PM_exposure, na.rm = TRUE)) %>%
  ungroup()

# Merge the total renewables data with the PM exposure data
merged_data <- merge(total_renewables, pm_exposure, by = c("country", "year"))

# Identify the top 10 countries with the highest total renewable energy production
top_countries <- merged_data %>%
  group_by(country) %>%
  summarize(total_renewables = sum(total_renewables, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_renewables)) %>%
  slice_head(n = 9) %>%
  pull(country)

# Filter the data for only these top countries
filtered_data <- merged_data %>%
  filter(country %in% top_countries)

# Normalize the data for PM exposure and renewable energy to allow using a secondary y-axis
max_PM <- max(filtered_data$PM_exposure, na.rm = TRUE)
max_renewables <- max(filtered_data$total_renewables, na.rm = TRUE)
filtered_data <- filtered_data %>%
  mutate(norm_PM = PM_exposure / max_PM,
         norm_renewables = total_renewables / max_renewables)

# Create a line plot with two separate y-axes and legends for PM Exposure and CO2 Emissions
fg2 <- ggplot(filtered_data, aes(x = year)) +
  geom_line(aes(y = norm_renewables, group = country, color = country, linetype = "Renewable Energy")) +
  geom_line(aes(y = norm_PM, group = country, color = country, linetype = "PM Exposure"), color = "grey") +
  facet_wrap(~ country, scales = 'free_x') + # Allows each country to have its own x-axis scale
  scale_y_continuous(
    "Normalized Renewable Energy Generation",
    sec.axis = sec_axis(~ ., name = "Normalized PM Exposure")
  ) +
  labs(
    title = "Renewable energy generation vs PM exposure for highest renewable energy generation",
    x = "Year"
  ) +
  scale_color_manual(values = scales::hue_pal()(length(unique(filtered_data$country))),
                     breaks = unique(filtered_data$country),
                     labels = unique(filtered_data$country),
                     name = "Country") +
  scale_linetype_manual(values = c("Renewable Energy" = "solid", "PM Exposure" = "dashed"),
                        name = "Type") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
    axis.text.y = element_text(size = 6),
    plot.title = element_text(hjust = 0.2, size = 10),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

Note : This plot presents individually the normalized renewable energy generation per capita change of the top 9 countries with the highest generation and their normalized PM exposure change.

Nevertheless, we observe that countries characterized by the lowest levels of renewable energy generation do not exhibit any substantial decline in PM2.5 exposure, indicating a stable or rising trend in PM levels. This observation suggests that low renewable energy generation is associated with a consistent or even increasing trend in PM exposure. However, it’s important to acknowledge that the low levels of renewable energy generation may not be the sole explanation for this relationship. Indeed, these nations are often high CO2 emitters known for pollution, or are developing nations with less advanced infrastructure, energy management practices, and technological resources.

Code
### FACET GRID PER COUNTRY RENEW VS PM2.5 with low renew gen

# Years for which we have PM exposure data
years_with_data <- c(1990, 1995, 2000, 2005, 2010:2019)

# Calculate total renewable energy production for each country and year
total_renewables <- all_df %>%
  group_by(country, year) %>%
  summarize(
    total_renewables = sum(wind_generation, solar_generation, geo_generation, hydro_generation, nuclear_generation, na.rm = TRUE)
  ) %>%
  ungroup()

# Filter out the years we don't have data for PM exposure
raw_df_filtered <- raw_df %>%
  filter(year %in% years_with_data)

# Calculate PM exposure for each country and year
pm_exposure <- raw_df_filtered %>%
  group_by(country, year) %>%
  summarize(PM_exposure = mean(PM_exposure, na.rm = TRUE)) %>%
  ungroup()

# Merge the total renewables data with the PM exposure data
merged_data <- merge(total_renewables, pm_exposure, by = c("country", "year"))

# Identify the top 10 countries with the highest total renewable energy production
top_countries <- merged_data %>%
  group_by(country) %>%
  summarize(total_renewables = sum(total_renewables, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(total_renewables)) %>%
  slice_tail(n = 9) %>%
  pull(country)

# Filter the data for only these top countries
filtered_data <- merged_data %>%
  filter(country %in% top_countries)

# Normalize the data for PM exposure and renewable energy to allow using a secondary y-axis
max_PM <- max(filtered_data$PM_exposure, na.rm = TRUE)
max_renewables <- max(filtered_data$total_renewables, na.rm = TRUE)
filtered_data <- filtered_data %>%
  mutate(norm_PM = PM_exposure / max_PM,
         norm_renewables = total_renewables / max_renewables)

# Create a line plot with two separate y-axes and legends for PM Exposure and CO2 Emissions
fg3 <- ggplot(filtered_data, aes(x = year)) +
  geom_line(aes(y = norm_renewables, group = country, color = country, linetype = "Renewable Energy")) +
  geom_line(aes(y = norm_PM, group = country, color = country, linetype = "PM Exposure"), color = "grey") +
  facet_wrap(~ country, scales = 'free_x') + # Allows each country to have its own x-axis scale
  scale_y_continuous(
    "Normalized Renewable Energy Generation",
    sec.axis = sec_axis(~ ., name = "Normalized PM Exposure")
  ) +
  labs(
    title = "Renewable energy generation vs PM exposure for lowest renewable energy generation",
    x = "Year"
  ) +
  scale_color_manual(values = scales::hue_pal()(length(unique(filtered_data$country))),
                     breaks = unique(filtered_data$country),
                     labels = unique(filtered_data$country),
                     name = "Country") +
  scale_linetype_manual(values = c("Renewable Energy" = "solid", "PM Exposure" = "dashed"),
                        name = "Type") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
    axis.text.y = element_text(size = 6),
    plot.title = element_text(hjust = 0.2, size = 10),
    axis.title.y = element_text(size = 9),
    axis.title.x = element_text(size = 9),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 9),
    legend.key.size = unit(0.4, "cm"),  
    legend.key.height = unit(0.4, "cm")
    )

Note : This plot presents individually the normalized renewable energy generation per capita change of the top 9 countries with the lowest generation and their normalized PM exposure change.

We observe an encouraging trend where PM2.5 exposure decreases as renewable energy generation increases, particularly in well-developed countries such as Sweden, Iceland, and Canada, where advanced technologies and energy policies are in place. This trend, while not as evident in less developed nations, offers a hopeful outlook, suggesting that with mindful energy generation and management, these countries could potentially replicate the positive outcomes seen in their more developed counterparts.

4 Analysis

Embarking on the analysis phase, we move beyond basic and multivariate comparisons to delve into the intricacies of our dataset. Visual trends have highlighted a global transition to sustainable energy sources, propelled by a blend of environmental awareness, policy incentives, and technological advancements. While the initial years emphasized hydroelectric dominance, subsequent decades saw the rise of nuclear, solar, and wind amid a continuous growth in emissions. Now, our focus shifts to a nuanced exploration of the evolving relationship between renewable energy, CO2 emissions, and air quality. Our aim is to unravel any complexities and unveil insights into the dynamic interplay, providing a deeper understanding of the forces shaping our sustainable energy landscape.

4.1 The Models

In scrutinizing the Correlation Matrix detailed in section 3.1.3, a preliminary observation unfolded: renewable sources exhibit varying degrees of negative correlation with CO2 emissions. Specifically possessing the following correlation values, Wind (-0.188), Solar (-0.089), Geo (-0.025), Hydro (-0.04), and Nuclear (-0.156). Wind and Nuclear stand out as statistically significant contributors to this relationship. Despite these findings, the magnitudes of these correlations are somewhat modest, leaving room for deeper exploration.

Our quest for understanding the intricate relationship between CO2 emissions and renewables persists. Analysis in section 3.6.1, “Renewable Energies’ impact on CO2 emissions”, reveals an interesting narrative. Contrary to expectations, an increase in renewable energy production coincides with higher CO2 emissions. Noteworthy anomalies, represented by small wealthy nations, challenge the general trend by either surpassing their expected CO2 per capita estimate or exhibiting elevated renewable energy production per capita.

4.1.1 Basic Multiple Linear Regression

To address our first research question, “To what extent does sustainable energy generation impact carbon dioxide emissions?”, we began by constructing a basic multiple linear regression model using raw_df, transforming all values into logarithmic scales for both total emissions and total renewable energy generated by each country.

\[logCO2 = {\beta}_0 + logwind{\beta}_1 + logsolar{\beta}_2 + loggeo{\beta}_3 + loghydro{\beta}_4+ lognuclear{\beta}_5 + \epsilon\]

Code
### MODEL 1 CO2###

log_raw <- raw_df %>%
  mutate(
    log_wind = log(wind_generation),
    log_solar = log(solar_generation),
    log_geo = log(geo_generation),
    log_hydro = log(hydro_generation),
    log_nuclear = log(nuclear_generation),
    log_CO2 = log(CO2_emissions * Population)
  )

log_raw <- log_raw %>%
  mutate_at(vars(-country), ~ ifelse(is.infinite(.), 0, .))


# Fit a multiple linear regression model
model <- lm(log_CO2 ~ log_wind + log_solar + log_geo + log_hydro + log_nuclear, data = log_raw)
CO2 Basic Logarithmic Multiple Linear Regression
Dependent variable:
log_CO2
log_wind 0.025**
(0.012)
log_solar -0.014
(0.010)
log_geo 0.108***
(0.015)
log_hydro 0.215***
(0.010)
log_nuclear 0.341***
(0.014)
Constant 17.600***
(0.029)
Observations 3,250
R2 0.390
Adjusted R2 0.389
Residual Std. Error 1.240 (df = 3244)
F Statistic 416.000*** (df = 5; 3244)
Note: p<0.1; p<0.05; p<0.01

At a glance of our basic model notable insights emerge. The pivotal p-values linked to each variable offer valuable interpretive cues. Wind, geothermal, hydro, and nuclear energies exhibit p-values below 0.05, indicating statistical significance in their relationship with CO2 emissions. However, solar energy’s p-value surpasses this threshold, suggesting a non-significant impact on CO2 emissions within this model.

Intriguingly, we need to consider the positive correlation reflected in all coefficients with significant p-values, particularly noteworthy in the cases of Hydro and Nuclear energies, strongly influencing the rate of CO2 change. The marginal associations of Wind and Solar energy may find explanation in their emergence as globally prevalent renewable sources, gaining prominence since the turn of the century.

While the model’s R-squared value of approximately 39% implies potential explanatory power, reservations linger. We remain cautious about inferring causation from this model alone. Despite its ability to provide a substantial portion of the relationship between renewable energy and CO2 emissions, the broader socio-economic context plays a pivotal role. Our society’s continual quest for production and economic expansion inherently accompanies an increase in CO2 emissions. Simultaneously, the integration of new technologies, including renewable energies, further complicates the intricate relationship between environmental factors and human activities.

4.1.2 Fixed Time Effect Model

The fixed time effect model is a robust approach for our dataset, accounting for time-specific variations across the observed years. This model is particularly apt when we anticipate and want to control for potential time-dependent factors that could influence the relationship between renewable energy variables and CO2 emissions. The ‘Within Model’ specification implies that individual country-specific effects are removed, isolating the time-specific changes and offering a more focused examination of the impact of renewable energy sources.

Code
### MODEL 2 CO2###

# Creation of the data frame for the fixed effect
log_raw_p <- pdata.frame(log_raw, index = c("country", "year"))

#Creation of a model with time fixed effect
modelplm <- plm(log_CO2 ~ log_wind + log_solar + log_geo + log_hydro + log_nuclear,
             data = log_raw_p,
             effect = "time", # This specifies time fixed effects
             model = "within") # This chooses the within estimator, which is for fixed effects
CO2 Fixed Time Logarithmic Multiple Linear Regression
Dependent variable:
log_CO2
log_wind 0.029**
(0.012)
log_solar 0.008
(0.011)
log_geo 0.109***
(0.015)
log_hydro 0.215***
(0.010)
log_nuclear 0.311***
(0.014)
Observations 3,250
R2 0.364
Adjusted R2 0.355
F Statistic 367.000*** (df = 5; 3200)
Note: p<0.1; p<0.05; p<0.01

In delving into the results of the new model, several familiar insights emerge. Somewhat noteworthy is that, all renewable energy sources, except solar, again exhibit statistically significant effects on CO2 emissions. Wind, geothermal, hydro, and nuclear energies showcase coefficients with p-values less than 0.05, indicating significant impact.

Hydro and nuclear energies continue to display positive correlations with CO2 emissions, while wind energy demonstrates a positive albeit smaller effect. These results mirror our previous model’s findings, emphasizing the intricate dynamics within the renewable energy landscape. The model’s comparatively reduced R-squared value of approximately 36.4% still indicates a substantial portion of the variance in CO2 emissions is explained by the included variables.

4.1.3 Dataframe Adjustment

Having found what we believe to be the most suitable model of a fixed time effect, we need to assess the data we were using and its relevance. In particular, for results pertaining to Solar and Wind Generation. Opting for the raw_1990 dataset over the broader raw_df offers several advantages for our analysis. Focusing on data from 1990 onwards allows for a more recent examination of the relationship between renewable energy generation and CO2 emissions. By narrowing the timeframe, we ensure that the data captures more relevant and representative information regarding the current state of various energy sources. Additionally, limiting the scope to post-1990 data facilitates a more consistent and comparable analysis across countries and energy types, aligning with a period marked by increased global attention to sustainable practices and greater advancements in renewable energy technologies. Overall, the raw_1990 dataset provides a refined lens through which we can discern the dynamics of renewable energy’s impact on CO2 emissions.

Code
### MODEL 3 CO2###

log_raw1990 <- raw_1990 %>%
  mutate(
    log_wind = log(wind_generation),
    log_solar = log(solar_generation),
    log_geo = log(geo_generation),
    log_hydro = log(hydro_generation),
    log_nuclear = log(nuclear_generation),
    log_CO2 = log(CO2_emissions * Population)
  )

log_raw1990 <- log_raw1990 %>%
  mutate_at(vars(-country), ~ ifelse(is.infinite(.), 0, .))


# Creation of the data frame for the fixed effect
log_raw1990_p <- pdata.frame(log_raw1990, index = c("country", "year"))

#Creation of a model with time fixed effect
modelplm1990 <- plm(log_CO2 ~ log_wind + log_solar + log_geo + log_hydro + log_nuclear,
             data = log_raw1990_p,
             effect = "time", # This specifies time fixed effects
             model = "within") # This chooses the within estimator, which is for fixed effects
CO2 Fixed Time Logarithmic Multiple Linear Regression Since 1990
Dependent variable:
log_CO2
log_wind 0.039***
(0.011)
log_solar 0.009
(0.010)
log_geo 0.097***
(0.015)
log_hydro 0.175***
(0.012)
log_nuclear 0.305***
(0.015)
Observations 2,002
R2 0.413
Adjusted R2 0.404
F Statistic 277.000*** (df = 5; 1971)
Note: p<0.1; p<0.05; p<0.01

Below, we assess the impact of each energy source on CO2 emissions:

  • log_wind: For a one-unit increase in the natural logarithm of wind energy generation, the natural logarithm of CO2 emissions is estimated to increase by approximately 0.0395 units. This positive coefficient suggests that higher wind energy generation is associated with an increase in CO2 emissions.

  • log_solar: For a one-unit increase in the natural logarithm of solar energy generation, the natural logarithm of CO2 emissions is estimated to increase by approximately 0.0087 units. However, it is not statistically significant (p-value = 0.4), suggesting that solar energy generation might not have a substantial impact on CO2 emissions in this model.

  • log_geo: For a one-unit increase in the natural logarithm of geothermal energy generation, the natural logarithm of CO2 emissions is estimated to increase by approximately 0.0972 units. This positive coefficient suggests that higher geothermal energy generation is associated with an increase in CO2 emissions.

  • log_hydro: For a one-unit increase in the natural logarithm of hydro energy generation, the natural logarithm of CO2 emissions is estimated to increase by approximately 0.1746 units. This positive coefficient implies that increased hydro energy generation is associated with a notable increase in CO2 emissions.

  • log_nuclear: For a one-unit increase in the natural logarithm of nuclear energy generation, the natural logarithm of CO2 emissions is estimated to increase by approximately 0.3054 units. This positive coefficient suggests that higher nuclear energy generation is associated with a substantial increase in CO2 emissions.

The analysis, conducted using a fixed-effects model, reveals that sustainable energy generation, expect for solar, has a substantial impact on carbon dioxide emissions. The overall model, with an R-squared value of approximately 41.3%, indicates that around 41.3% of the variation in CO2 emissions can be explained by the included sustainable energy variables of, wind, solar, geothermal, hydro, and nuclear energies. This suggests a strong connection between the adoption of these sustainable energy sources and the increase in carbon dioxide emissions.

4.1.4 Interpretation

The observation that increasing renewable energy generation is associated with increased CO2 emissions in our model might seem counterintuitive at first glance, but several factors could contribute to this phenomenon:

Infrastructure Development: The initial stages of adopting renewable energy often involve significant infrastructure development, which can be energy-intensive and may rely on conventional energy sources. The manufacturing, installation, and setup of renewable energy facilities, such as wind farms, solar arrays, nuclear power plants, and dams, can significantly contribute to CO2 emissions.

Economic Growth: Increased adoption of renewable energy is occurring simultaneously with economic growth. As economies expand, there is often a rise in energy demand, leading to increased overall energy production, including both renewable and non-renewable sources.

Global Supply Chain: The global supply chain for renewable technologies involves transportation and manufacturing processes that contribute to CO2 emissions. The extraction of raw materials for renewable technologies also has environmental impacts.

Lag in Emission Reduction: The transition to renewable energy is a process, and the full benefits in terms of reduced CO2 emissions might take time to materialize. During the transition, there may be a period where the growth in renewable energy is still accompanied by the use of conventional energy sources.

It’s essential to consider these nuances and contextual factors when interpreting the relationship between renewable energy generation and CO2 emissions. As renewable energy infrastructure becomes more established and technology improves, we may see reductions in CO2 emissions.

4.2 Modeling PM 2.5

Continuing with our analysis, we needed to investigate the effects of renewable energy on improving air quality in countries transitioning to cleaner sources of energy. In this model, we opted not to take the logarithm of the energy sources and PM2.5 exposure. The decision was made to maintain the original scale, facilitating a better interpretation of the coefficients. raw_1990 remained as the dataset for the model and this decision was based on the fact that all our data for PM_exposure is available post-1990. Additionally, we continued employing the fixed-effects model for our analysis.

Code
### MODEL 1 PM###

#Creation of a model with time fixed effect
modelPMp <- plm(PM_exposure ~ wind_generation + solar_generation + geo_generation + hydro_generation + nuclear_generation,
             data = log_raw1990_p,
             effect = "time", # This specifies time fixed effects
             model = "within") # This chooses the within estimator, which is for fixed effects
PM2.5 Fixed Time Multiple Linear Regression Since 1990
Dependent variable:
PM_exposure
wind_generation 0.148***
(0.044)
solar_generation -0.044
(0.085)
geo_generation -0.287***
(0.079)
hydro_generation 0.012**
(0.006)
nuclear_generation -0.030***
(0.007)
Observations 1,078
R2 0.056
Adjusted R2 0.040
F Statistic 12.500*** (df = 5; 1059)
Note: p<0.1; p<0.05; p<0.01

Prior to discussing the modest R-squared value, let’s first examine the coefficients of the model:

  • wind_generation: A one-unit increase in wind energy generation is associated with a 0.1478-unit increase in PM2.5 exposure. This suggests that higher wind energy generation is linked to a moderate increase in PM2.5 exposure.

  • solar_generation: The coefficient for solar generation is -0.0440, indicating a slight negative association. However, it is not statistically significant (p-value = 0.605), suggesting that solar energy generation might not have a substantial impact on PM2.5 exposure in this model.

  • geo_generation: A one-unit increase in geothermal energy generation is associated with a 0.2872-unit decrease in PM2.5 exposure. This implies that higher geothermal energy generation is linked to a reduction in PM2.5 exposure.

  • hydro_generation: A one-unit increase in hydro energy generation is associated with a 0.0122-unit increase in PM2.5 exposure. This coefficient is statistically significant (p-value = 0.049), suggesting a modest positive association between hydro energy generation and PM2.5 exposure.

  • nuclear_generation: A one-unit increase in nuclear energy generation is associated with a 0.0303-unit decrease in PM2.5 exposure. This coefficient is statistically significant (p-value < 0.001), indicating that higher nuclear energy generation is linked to a reduction in PM2.5 exposure.

However, the R-squared value of 0.0557 suggests that the energy generation variables explain only about 5.57% of the variation in PM2.5 exposure. This low R-squared value could be attributed to several factors:

Missing Variables: There might be many unaccounted-for factors influencing PM2.5 exposure that are not included in our model.

Complexity of PM2.5 Formation: PM2.5 is influenced by various sources, including industrial activities, transportation, and meteorological conditions. The model does not capture the full complexity of these interactions.

Spatial and Temporal Variability: PM2.5 levels can vary significantly across different regions within countries. A fixed-effects model with specification only to the country level may not fully capture this variability.

While the R-squared value is low, the individual coefficients still provide valuable insights into the specific potential impacts of each energy generation type on PM2.5 exposure. However, further exploration and consideration of additional factors are needed to enhance the model’s explanatory capacity.

4.3 Answering The Research Questions

4.3.1 To what extent does sustainable energy generation impact carbon dioxide emissions?

Our analysis uncovers a positive correlation between sustainable energy generation and carbon dioxide (CO2) emissions. Initially counterintuitive, the rise in renewable energy production aligns with increased CO2 emissions, but notable anomalies are revealed that challenge the overarching trend, notably linked to country size and wealth such as Nordic countries. The observed positive correlation may reflect that countries with robust renewable energy generation are also those with developed technologies, industries, and economy. This development often leads to increased energy demand, which, despite a rise in renewable sources, may not immediately offset the surge in industrial emissions. Hence, it’s plausible that the infrastructural and economic advancements inherent to these nations inadvertently elevate CO2 emissions in the short term, as they transition towards greener practices. Finally, our chosen model elucidates approximately 41.3% of the variance in CO2 emissions, emphasizing an existing positive correlation between the adoption of sustainable energy and heightened carbon dioxide emissions.

4.3.2 What are the key factors and types of sustainable energy sources that have the most significant impact on reducing CO2 emissions?

Wind, geothermal, hydro, and nuclear energies show statistically significant positive coefficients with CO2 emission rates. Solar energy, while not statistically significant, exhibits a positive coefficients but with a smaller effect. Hydro and nuclear have the largest coefficients and have been historically the most represented of the renewable variables in our data. In exploring key factors outside of our variables, exploratory analysis unveiled other important factors. Country size emerges as a pivotal factor influencing the efficacy of CO2 reduction efforts. Larger nations, endowed with extensive resources and diverse energy needs, may encounter challenges in swiftly transitioning to renewable sources, potentially affecting the overall success of emission reduction initiatives. Additionally, economic and development considerations play an important role, as the magnitude of a country’s economic growth can influence its ability to invest in and adopt sustainable energy technologies. The apparent need for tailored strategies and policy frameworks that account for the unique circumstances of each nation in fostering effective and sustainable CO2 reduction.

4.3.3 How does sustainable energy generation impact the quality of air (reduction of PM 2.5)?

According to our modeling, wind and hydro energy show positive associations with PM2.5 exposure, suggesting a potential trade-off between renewable energy adoption and air quality. On the other hand, solar, geothermal, and nuclear exhibit negative associations. However, only Geothermal energy is linked to a substantial reduction in PM2.5 exposure, emphasizing its positive impact on air quality.

After considering the R-squared value of 0.0557, we know that many omitted variables are not being considered. For this reason, it is not feasible to assert that the generation of renewable energy has a considerable influence on the levels of PM2.5 exposure. It’s important to also acknowledge the limitations highlighted in section 3.4, which is the potential oversimplification of air quality assessments based on country-wide averages across a year. These factors caution against drawing definitive conclusions and underscore the need for a more nuanced understanding of the relationship between sustainable energy generation and air quality.

4.3.4 Is there a temporal trend between the growth of sustainable energy generation and the reduction in CO2 emissions and what implications does this trend hold for future sustainability efforts?

The transition to renewable energy is a gradual and complex process, with the potential for future sustainability efforts to contribute to reductions in CO2 emissions as technology advances and renewable infrastructure becomes more established. However, our current modeling limitations prevent a definitive answer to this question. Our exploratory analysis helped paint a better picture. From our line plots, we are seeing trends of per capita emissions dropping, especially in the developed countries, but it would be dangerous to link this solely to the increased adoption of renewable. Growing populations and increased efficiency are factors playing influential roles for this trend. We believe that by 2050, with a mature solar and wind energy market, and ramped up productions globally, a clearer idea may be available as to the relationships we have evaluated.

5 Conclusion

5.1 Results

In summary, our analysis provides valuable insights into the intricate dynamics of sustainable energy generation and its impact on carbon dioxide (CO2) emissions and air quality. Contrary to expectations, our analysis reveal a positive association between the annual renewable energy production and the annual CO2 emissions. All sources, except solar, presented significant and positive coefficients in our modeling, suggesting that increasing renewable energy generation is associated with higher CO2 emissions. Additionally, graphic analysis highlights the global growth in adoption of renewables, with powerhouse countries such as China and the US leading the way in scaling up. On a per capita basis, wealthy smaller nations, such as Scandinavian countries, are achieving higher green energy production per capita. As for CO2 emissions, although globally on the rise, per capita rates appear to be dropping in developed countries. PM2.5 levels show slight decreases, but insufficient data is available to establish a definitive trend.

5.2 Limitations

Despite these findings, our analysis grapples with several limitations that merit consideration. Causation remains challenging to definitively establish due to the multitude of omitted factors influencing CO2 emissions and air quality. Economic variables, such as GDP, trade, and consumption, play pivotal roles in CO2 emissions and were not comprehensively accounted in our models. The reliance on country-level values for PM2.5 measures introduces biases, as these values might not capture localized variations in air quality within countries. Additionally, the limited number of years of available data restricts the depth of our temporal analysis and hinders the ability to draw robust conclusions about long-term trends. Future research with more extensive datasets and economic variables can enhance our understanding of the complex interplay between sustainable energy adoption, emissions, and air quality.

5.3 Future work

To enhance our understanding, future research should adopt a more granular approach, exploring country-specific relationships and delving into the impact of specific policies and technological advancements over time. A closer examination of regional variations and the role of governmental interventions could provide interesting insights. Continuous monitoring and regular updates to our dataset will be vital for staying abreast of the evolving trends in renewable energy adoption and understanding its broader implications. Additionally, investigating the socio-economic factors that mediate the relationship between renewable energy adoption and environmental outcomes could offer a more comprehensive perspective.

6 References

Database :

Renewable energy generation from the “Energy Institute” : https://www.energyinst.org/statistical-review/resources-and-data-downloads

CO2 Emissions Per Capita from “Climate Watch” : https://www.climatewatchdata.org/data-explorer

PM2.5 Exposure Level from the “World Bank” : https://databank.worldbank.org/reports.aspx?source=2%20&series=EN.ATM.PM25.MC.M3&country=#

Population per Country from the “World Bank” : https://databank.worldbank.org/reports.aspx?source=2&series=SP.POP.TOTL&country=#

Kuwait Oil Fire in 1991 : https://visibleearth.nasa.gov/images/78594/kuwait-oil-fires